# Data Science Capstone
## Brian Dunn

##### Problem Statement:  A person might currently be living in New York City, but considering a move to Toronto.  Assuming that their current neighborhood in NYC is East Harlem, which neighborhoods in Toronto would be the best fit for them?

##### Needed Data: Longitude and latitude data for neighboorhoods in Toronto.  Longitude and latitude data for East Harlem.  Venue data for the listed neighborhoods

##### Solution Statement/Methodology: Use K-means clustering to determine which neighborhoods exhibit similar characteristics according to Foursquare's venue data.  List which neighborhoods in Toronto are in the same cluster as East Harlem, NYC.

###### Data Sources

Longitude and latitude data: Scaped from wikipedia

Venue data: Foursquare

### Results: The end list of clustered neighborhoods shows that neighborhoods in cluster 2 exhibit similar characteristics to East Harlem, NYC. Someone moving from East Harlem to Toronto might use this list to narrow down their search of new neighborhoods in Toronto.

### Discussion/Conclusion (Also shown at the end of this report): Using venue data may not be the perfect way to capture the essence of a neighborhood, but it would provide a valid initial indication of which neighborhoods offer similar goods and services.  Given that such establishments exist to cater mainly to the local residents, this can serve as a good proxy for the "vibe" of a particular neighborhood.  Future improvements to this model could also include events data to capture additional information about each neighborhood. 

## Getting Data

In [1]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    ------------------------------------------------------------
                       

In [2]:
import pandas as pd
import json
import requests
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
import folium

In [3]:
# The code was removed by Watson Studio for sharing.

In [4]:
wiki = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
parse = BeautifulSoup(wiki, 'html.parser')

In [5]:
postalCodeList = []
boroughList = []
neighborhoodList = []

for row in parse.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        postalCodeList.append(cells[0].text)
        boroughList.append(cells[1].text)
        neighborhoodList.append(cells[2].text.rstrip('\n'))

In [6]:
torontoDf = pd.DataFrame({"PostalCode": postalCodeList,
                           "Borough": boroughList,
                           "Neighborhood": neighborhoodList})

torontoDf.head()


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A\n,Not assigned\n,Not assigned
1,M2A\n,Not assigned\n,Not assigned
2,M3A\n,North York\n,Parkwoods
3,M4A\n,North York\n,Victoria Village
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront"


In [7]:
torontoDf = torontoDf.replace('\n','', regex=True)
torontoDf.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [8]:
torontoDfDrop = torontoDf[torontoDf.Borough != "Not assigned"].reset_index(drop=True)
torontoDfGrouped = torontoDfDrop.groupby(["PostalCode", "Borough"], as_index=False).agg(lambda x: ", ".join(x))

for index, row in torontoDfGrouped.iterrows():
    if row["Neighborhood"] == "Not assigned":
        row["Neighborhood"] = row["Borough"]

In [9]:
coordinates = pd.read_csv('https://cocl.us/Geospatial_data')
coordinates.rename(columns={"Postal Code": "PostalCode"}, inplace=True)
coordinates.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [11]:
torontoDfNew = torontoDfGrouped.merge(coordinates, on="PostalCode", how="left")
torontoDfNew.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [12]:
borough_names = list(torontoDfNew.Borough.unique())

borough_with_toronto = []

for x in borough_names:
    if "toronto" in x.lower():
        borough_with_toronto.append(x)
        
borough_with_toronto

['East Toronto', 'Central Toronto', 'Downtown Toronto', 'West Toronto']

In [13]:
torontoDfNew = torontoDfNew[torontoDfNew['Borough'].isin(borough_with_toronto)].reset_index(drop=True)
print(torontoDfNew.shape)
torontoDfNew.head()

(39, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [25]:
torontoHarlemDf=torontoDfNew

In [26]:
torontoHarlemDf=torontoHarlemDf.append({'PostalCode':'10029', 'Borough':'Manhattan', 'Neighborhood':'East Harlem', 'Latitude':40.7957, 'Longitude':-73.9389}, ignore_index=True)

In [28]:
torontoHarlemDf.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [29]:
radius = 500
LIMIT = 100

venues = []

for lat, long, post, borough, neighborhood in zip(torontoHarlemDf['Latitude'], torontoHarlemDf['Longitude'], torontoHarlemDf['PostalCode'], torontoHarlemDf['Borough'], 
                                                  torontoHarlemDf['Neighborhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        clientID,
        clientSecret,
        version,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post, 
            borough,
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [31]:
print(venues)

[('M4E', 'East Toronto', 'The Beaches', 43.67635739999999, -79.2930312, 'Glen Manor Ravine', 43.67682094413784, -79.29394208780985, 'Trail'), ('M4E', 'East Toronto', 'The Beaches', 43.67635739999999, -79.2930312, 'The Big Carrot Natural Food Market', 43.678879, -79.297734, 'Health Food Store'), ('M4E', 'East Toronto', 'The Beaches', 43.67635739999999, -79.2930312, 'Grover Pub and Grub', 43.679181434941015, -79.29721535878515, 'Pub'), ('M4E', 'East Toronto', 'The Beaches', 43.67635739999999, -79.2930312, 'Upper Beaches', 43.68056321147582, -79.2928688743688, 'Neighborhood'), ('M4K', 'East Toronto', 'The Danforth West, Riverdale', 43.6795571, -79.352188, 'MenEssentials', 43.677820068604575, -79.35126543045044, 'Cosmetics Shop'), ('M4K', 'East Toronto', 'The Danforth West, Riverdale', 43.6795571, -79.352188, 'Pantheon', 43.67762124481265, -79.35143390043564, 'Greek Restaurant'), ('M4K', 'East Toronto', 'The Danforth West, Riverdale', 43.6795571, -79.352188, 'Cafe Fiorentina', 43.677743, -

In [32]:
venues_df = pd.DataFrame(venues)


venues_df.columns = ['PostalCode', 'Borough', 'Neighborhood', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1672, 9)


Unnamed: 0,PostalCode,Borough,Neighborhood,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,M4E,East Toronto,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,M4E,East Toronto,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,MenEssentials,43.67782,-79.351265,Cosmetics Shop


In [33]:
venues_df.groupby(["PostalCode", "Borough", "Neighborhood"]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
PostalCode,Borough,Neighborhood,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
10029,Manhattan,East Harlem,59,59,59,59,59,59
M4E,East Toronto,The Beaches,4,4,4,4,4,4
M4K,East Toronto,"The Danforth West, Riverdale",43,43,43,43,43,43
M4L,East Toronto,"India Bazaar, The Beaches West",19,19,19,19,19,19
M4M,East Toronto,Studio District,40,40,40,40,40,40
M4N,Central Toronto,Lawrence Park,3,3,3,3,3,3
M4P,Central Toronto,Davisville North,8,8,8,8,8,8
M4R,Central Toronto,"North Toronto West, Lawrence Park",20,20,20,20,20,20
M4S,Central Toronto,Davisville,34,34,34,34,34,34
M4T,Central Toronto,"Moore Park, Summerhill East",4,4,4,4,4,4


In [34]:
# one hot encoding
torontoHarlem_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
torontoHarlem_onehot['PostalCode'] = venues_df['PostalCode'] 
torontoHarlem_onehot['Borough'] = venues_df['Borough'] 
torontoHarlem_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(torontoHarlem_onehot.columns[-3:]) + list(torontoHarlem_onehot.columns[:-3])
torontoHarlem_onehot = torontoHarlem_onehot[fixed_columns]

print(torontoHarlem_onehot.shape)
torontoHarlem_onehot.head()

(1672, 243)


Unnamed: 0,PostalCode,Borough,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4K,East Toronto,"The Danforth West, Riverdale",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [36]:
torontoHarlem_grouped = torontoHarlem_onehot.groupby(["PostalCode", "Borough", "Neighborhoods"]).mean().reset_index()

print(torontoHarlem_grouped.shape)
torontoHarlem_grouped

(40, 243)


Unnamed: 0,PostalCode,Borough,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,10029,Manhattan,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.016949,0.0,0.016949,0.0,0.0,0.0,0.0
1,M4E,East Toronto,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M4K,East Toronto,"The Danforth West, Riverdale",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256
3,M4L,East Toronto,"India Bazaar, The Beaches West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M4M,East Toronto,Studio District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.025
5,M4N,Central Toronto,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,M4P,Central Toronto,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,M4R,Central Toronto,"North Toronto West, Lawrence Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05
8,M4S,Central Toronto,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,M4T,Central Toronto,"Moore Park, Summerhill East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['PostalCode', 'Borough', 'Neighborhoods']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['PostalCode'] = torontoHarlem_grouped['PostalCode']
neighborhoods_venues_sorted['Borough'] = torontoHarlem_grouped['Borough']
neighborhoods_venues_sorted['Neighborhoods'] = torontoHarlem_grouped['Neighborhoods']

for ind in np.arange(torontoHarlem_grouped.shape[0]):
    row_categories = torontoHarlem_grouped.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

# neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted

(40, 13)


Unnamed: 0,PostalCode,Borough,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,10029,Manhattan,East Harlem,Pizza Place,Mexican Restaurant,Bakery,Pharmacy,Italian Restaurant,Taco Place,Spanish Restaurant,Donut Shop,Café,Latin American Restaurant
1,M4E,East Toronto,The Beaches,Health Food Store,Pub,Trail,Neighborhood,Cosmetics Shop,Distribution Center,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
2,M4K,East Toronto,"The Danforth West, Riverdale",Greek Restaurant,Italian Restaurant,Coffee Shop,Furniture / Home Store,Ice Cream Shop,Restaurant,Bookstore,Grocery Store,Brewery,Bubble Tea Shop
3,M4L,East Toronto,"India Bazaar, The Beaches West",Fast Food Restaurant,Restaurant,Movie Theater,Brewery,Liquor Store,Fish & Chips Shop,Sandwich Place,Steakhouse,Italian Restaurant,Sushi Restaurant
4,M4M,East Toronto,Studio District,Café,Coffee Shop,Gastropub,Bakery,Brewery,American Restaurant,Cheese Shop,Italian Restaurant,Bookstore,Fish Market
5,M4N,Central Toronto,Lawrence Park,Park,Bus Line,Swim School,Yoga Studio,Dog Run,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
6,M4P,Central Toronto,Davisville North,Hotel,Park,Breakfast Spot,Department Store,Food & Drink Shop,Sandwich Place,Pizza Place,Gym / Fitness Center,Falafel Restaurant,Event Space
7,M4R,Central Toronto,"North Toronto West, Lawrence Park",Clothing Store,Coffee Shop,Sporting Goods Shop,Yoga Studio,Italian Restaurant,Gym / Fitness Center,Rental Car Location,Restaurant,Chinese Restaurant,Diner
8,M4S,Central Toronto,Davisville,Dessert Shop,Sandwich Place,Pizza Place,Gym,Italian Restaurant,Café,Coffee Shop,Sushi Restaurant,Greek Restaurant,Seafood Restaurant
9,M4T,Central Toronto,"Moore Park, Summerhill East",Trail,Tennis Court,Lawyer,Summer Camp,Yoga Studio,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant


In [51]:
kclusters = 6

torontoHarlem_grouped_clustering = torontoHarlem_grouped.drop(["PostalCode", "Borough", "Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(torontoHarlem_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 2, 2, 2, 1, 2, 2, 2, 5], dtype=int32)

In [52]:

#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
torontoHarlem_merged = torontoHarlemDf.copy()

# add clustering labels
torontoHarlem_merged["Cluster Labels"] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
torontoHarlem_merged = torontoHarlem_merged.join(neighborhoods_venues_sorted.drop(["Borough", "Neighborhoods"], 1).set_index("PostalCode"), on="PostalCode")

print(torontoHarlem_merged.shape)
torontoHarlem_merged.head() # check the last columns!

(40, 16)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,Health Food Store,Pub,Trail,Neighborhood,Cosmetics Shop,Distribution Center,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,2,Greek Restaurant,Italian Restaurant,Coffee Shop,Furniture / Home Store,Ice Cream Shop,Restaurant,Bookstore,Grocery Store,Brewery,Bubble Tea Shop
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,2,Fast Food Restaurant,Restaurant,Movie Theater,Brewery,Liquor Store,Fish & Chips Shop,Sandwich Place,Steakhouse,Italian Restaurant,Sushi Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,2,Café,Coffee Shop,Gastropub,Bakery,Brewery,American Restaurant,Cheese Shop,Italian Restaurant,Bookstore,Fish Market
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,2,Park,Bus Line,Swim School,Yoga Studio,Dog Run,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


In [53]:
# sort the results by Cluster Labels
print(torontoHarlem_merged.shape)
torontoHarlem_merged.sort_values(["Cluster Labels"], inplace=True)
torontoHarlem_merged

(40, 16)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,M6S,West Toronto,"Runnymede, Swansea",43.651571,-79.48445,0,Sushi Restaurant,Café,Coffee Shop,Pub,Pizza Place,Italian Restaurant,Yoga Studio,French Restaurant,Restaurant,Latin American Restaurant
35,M6R,West Toronto,"Parkdale, Roncesvalles",43.64896,-79.456325,0,Breakfast Spot,Gift Shop,Bookstore,Coffee Shop,Eastern European Restaurant,Bar,Movie Theater,Dog Run,Restaurant,Dessert Shop
34,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763,0,Café,Mexican Restaurant,Thai Restaurant,Bookstore,Diner,Liquor Store,Cajun / Creole Restaurant,Flea Market,Park,Speakeasy
32,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,0,Bar,Asian Restaurant,Restaurant,Men's Store,Coffee Shop,Vegetarian / Vegan Restaurant,Café,Korean Restaurant,New American Restaurant,Japanese Restaurant
31,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,0,Pharmacy,Bakery,Café,Furniture / Home Store,Grocery Store,Pet Store,Brewery,Bar,Park,Bank
26,M5T,Downtown Toronto,"Kensington Market, Chinatown, Grange Park",43.653206,-79.400049,0,Café,Coffee Shop,Dessert Shop,Mexican Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Bar,Bakery,Pizza Place,Park
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197,1,Hotel,Park,Breakfast Spot,Department Store,Food & Drink Shop,Sandwich Place,Pizza Place,Gym / Fitness Center,Falafel Restaurant,Event Space
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,Health Food Store,Pub,Trail,Neighborhood,Cosmetics Shop,Distribution Center,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
21,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,2,Coffee Shop,Café,Restaurant,Hotel,American Restaurant,Gym,Japanese Restaurant,Seafood Restaurant,Deli / Bodega,Italian Restaurant
22,M5N,Central Toronto,Roselawn,43.711695,-79.416936,2,Garden,Yoga Studio,Fish Market,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


## Results: The above list of clustered neighborhoods shows that neighborhoods in cluster 2 exhibit similar characteristics to East Harlem, NYC.  Someone moving from East Harlem to Toronto might use this list to narrow down their search of new neighborhoods in Toronto.

## Discussion/Conclusion: Using venue data may not be the perfect way to capture the essence of a neighborhood, but it would provide a valid initial indication of which neighborhoods offer similar goods and services.  Given that such establishments exist to cater mainly to the local residents, this can serve as a good proxy for the "vibe" of a particular neighborhood.  Future improvements to this model could also include events data to capture additional information about each neighborhood. 