# Capstone Project Report - Clustering Toronto and Vancouver Neighborhoods

## I. Introduction & Business Problem

When an individual lives in a diverse and physically large country like Canada, one is bound to move from one city to another. Either at the personal level as one searches/move for jobs or at the business level where a restaurant might be interested in opening a branch in another city. These individuals or entities would be interested in neighborhoods that are similar to each other between the two cities.

**Business Problem:** How can we find a similar neighborhood in another city from our own?

**Target Audience:** Anyone interested in moving between cities including businesses interested in a strategy to identify new location across countries.

## II. Data

To address this business problem and create a proof of concept solution, we will be interested in Toronto Neighborhoods and their similar neighborhoods in Vancouver. Geographically, they are quite far apart where Toronto can be considered to be on the East Coast of continental North America and Vancouver is on the West Coast.

**List of Neighborhoods:** Similar to the Toronto Lab, we will be obtaining a complete list of neighborhoods for the the two cities using first **Wikipedia**.

**Geo Location:** Using the geopy library, we would be able to join the list of neighborhoods to their respective cordinates.

**Foursquare API:** Then using the Foursquare API will we be able to obtain the characteristics and data on each neighborhood. Then the data will be joined with the list of neighborhood. This final dataset will be used for K-means clustering of neighborhoods to find clusters--groups of similar neighborhoods.

## III. Methodology

In order to deliver on the business goal at hand of finding similar neighborhoods in a pool of Vancouver and Toronto Neighborhoods, the following was completed:

1. Getting a list of Neighborhoods from **Wikipedia**. A combination of **BeautifulSoup** and manual implementation was used to retrieve a list of Postal codes of the respective cities. Then using the **Google Maps API**, the postal code and province of where it came from was used to search for Geocoordinates.

2. After getting the coordinates for each postal code, the **Foursquare API** was used to retrieve a list of venues most popular in the neighborhood. Then, characteristics dummy variables were generated for each.

3. **K-Means Clustering** (k=10) was used to identify similar neighborhoods.

## IV. Results

In general, the results showed well clustered groups with a few exceptions where groups are only one member with no other similar neighborhoods in it. This was unfortunate. Some good clusters deserving of attention below includes:

* **Cluster 2:** Includes a lot of Falafel Restaurants that were part of the most common venues.
* **Cluster 3:** Dominated by Zoos and parks in the top venues.
* **Cluster 4:** Construction & Landscaping venues with Ethiopian Restaurants
* **Cluster 5:** All of the neighborhood's top 10 venues matched exactly. They seem geographically close to each other too in Vancouver.
* **Cluster 10:** Trails and Parks

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

In [6]:
#cluster1
cluster1 = pd.read_csv("cluster1.csv")
cluster1

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,V5E,BC,"V5E, BC",49.231088,-122.947268,Other Repair Shop,Zoo,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Farm


In [3]:
#cluster2
cluster2 = pd.read_csv("cluster2.csv")
cluster2

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,ON,"M5A, ON",43.654260,-79.360636,Coffee Shop,Bakery,Park,Pub,Café,Breakfast Spot,Theater,Bank,Beer Store,French Restaurant
1,M1B,ON,"M1B, ON",43.806686,-79.194353,Print Shop,Fast Food Restaurant,Zoo,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair
2,M3B,ON,"M3B, ON",43.745906,-79.352188,Japanese Restaurant,Café,Beer Store,Restaurant,Coffee Shop,Gym,Asian Restaurant,Sporting Goods Shop,Caribbean Restaurant,Chinese Restaurant
3,M5B,ON,"M5B, ON",43.656081,-79.380171,Clothing Store,Coffee Shop,Italian Restaurant,Bubble Tea Shop,Restaurant,Café,Cosmetics Shop,Tea Room,Middle Eastern Restaurant,Diner
4,M6B,ON,"M6B, ON",43.709577,-79.445073,Pub,Japanese Restaurant,Sushi Restaurant,Park,Zoo,Fair,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
145,V9E,BC,"V9E, BC",48.527525,-123.461885,Lake,Zoo,Farm,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant
146,V9R,BC,"V9R, BC",49.165722,-124.000170,Lake,Construction & Landscaping,Tennis Court,Campground,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop
147,V9V,BC,"V9V, BC",49.237891,-124.032354,Home Service,Recreation Center,Farm,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant
148,V9W,BC,"V9W, BC",50.014661,-125.260855,Tour Provider,Furniture / Home Store,Zoo,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair


In [105]:
]

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [4]:
#cluster3
cluster3 = pd.read_csv("cluster3.csv")
cluster3

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,ON,"M3A, ON",43.753259,-79.329656,Food & Drink Shop,Park,Zoo,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair
1,M6E,ON,"M6E, ON",43.689026,-79.453512,Park,Women's Store,Pool,Zoo,Fair,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit
2,M4J,ON,"M4J, ON",43.685347,-79.338106,Park,Metro Station,Convenience Store,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair
3,M4N,ON,"M4N, ON",43.72802,-79.38879,Swim School,Park,Bus Line,Zoo,Fair,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop
4,M9N,ON,"M9N, ON",43.706876,-79.518188,Park,Zoo,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Farm
5,M2P,ON,"M2P, ON",43.752758,-79.400049,Park,Convenience Store,Bank,Zoo,Falafel Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair
6,M9R,ON,"M9R, ON",43.688905,-79.554724,Sandwich Place,Mobile Phone Shop,Park,Zoo,Fair,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop
7,M1V,ON,"M1V, ON",43.815252,-79.284577,Playground,Park,Zoo,Fair,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop
8,M4W,ON,"M4W, ON",43.679563,-79.377529,Park,Playground,Trail,Zoo,Fair,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit
9,M8X,ON,"M8X, ON",43.653654,-79.506944,Pool,Park,River,Zoo,Fair,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit


In [5]:
#cluster4
cluster4 = pd.read_csv("cluster4.csv")
cluster4

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1J,ON,"M1J, ON",43.744734,-79.239476,Construction & Landscaping,Playground,Zoo,Farm,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair
1,M8Y,ON,"M8Y, ON",43.636258,-79.498509,Construction & Landscaping,Baseball Field,Zoo,Farm,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant
2,V2S,BC,"V2S, BC",49.017244,-122.284334,Farm,Construction & Landscaping,Zoo,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant
3,V3A,BC,"V3A, BC",49.083738,-122.645903,Construction & Landscaping,Zoo,Farm,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant
4,V9T,BC,"V9T, BC",49.21585,-123.986978,Construction & Landscaping,Zoo,Farm,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant


In [7]:
#cluster5
cluster5 = pd.read_csv("cluster5.csv")
cluster5

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,V1S,BC,"V1S, BC",50.623004,-120.410181,Business Service,Zoo,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Farm
1,V1V,BC,"V1V, BC",49.950056,-119.428336,Business Service,Zoo,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Farm
2,V4X,BC,"V4X, BC",49.088824,-122.415286,Business Service,Zoo,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Farm


In [8]:
#cluster6
cluster6 = pd.read_csv("cluster6.csv")
cluster6

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,V9S,BC,"V9S, BC",49.18839,-123.975037,Rental Car Location,Zoo,Farm,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant


In [10]:
#cluster7
cluster7 = pd.read_csv("cluster7.csv")
cluster7

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M9M,ON,"M9M, ON",43.724766,-79.532242,Baseball Field,Zoo,Farm,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant,Farmers Market
1,V9A,BC,"V9A, BC",48.437369,-123.411284,Baseball Field,Zoo,Farm,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant,Farmers Market


In [11]:
#cluster8
cluster8 = pd.read_csv("cluster8.csv")
cluster8

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,V9M,BC,"V9M, BC",49.70103,-124.919617,Shop & Service,Zoo,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Farm


In [12]:
#cluster9
cluster9 = pd.read_csv("cluster9.csv")
cluster9

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,V3W,BC,"V3W, BC",49.146665,-122.856807,Bus Station,Zoo,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Farm


In [13]:
#cluster10
cluster10 = pd.read_csv("cluster10.csv")
cluster10

Unnamed: 0,PostalCode,Province,search,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5P,ON,"M5P, ON",43.696948,-79.411307,Jewelry Store,Sushi Restaurant,Trail,Park,Zoo,Fair,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit
1,M4T,ON,"M4T, ON",43.689574,-79.38316,Restaurant,Park,Trail,Zoo,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop
2,V4E,BC,"V4E, BC",49.117583,-122.903473,Trail,Zoo,Drugstore,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant
3,V4P,BC,"V4P, BC",49.056842,-122.819221,Construction & Landscaping,Park,Trail,Zoo,Fair,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit
4,V4R,BC,"V4R, BC",49.240585,-122.561155,Lake,Trail,Drugstore,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair
5,V6G,BC,"V6G, BC",49.30426,-123.143792,Trail,Park,Lake,Fair,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop
6,V6S,BC,"V6S, BC",49.253306,-123.222192,Trail,Zoo,Drugstore,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair,Falafel Restaurant
7,V7C,BC,"V7C, BC",49.158647,-123.172266,Bus Stop,Sushi Restaurant,Trail,Zoo,Falafel Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Fair
8,V8X,BC,"V8X, BC",48.478664,-123.362262,Construction & Landscaping,Park,Trail,Zoo,Fair,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit


## V. Discussion and Conclusion

The goal of this project has value. Nevertheless, the results of the investigation showed that more refinement can be made as the choice for clusters. Known clusters could have been implemented to construct training and testing sets for the K-means clustering algorithm. 

Having said that there were interesting clusters of Ethiopian restaurants and Trails were found to have commonality. At the same time, another clustering algorithm could have been used alongside as a point of comparison (i.e. hierarchical clustering). 