## Introduction/BusinessmProblem: In this project we have an assignment to determine the best neighborhoods to open up a bake shop within New York or Toronto. The given in this case are the most competitive prices of production costs (to compete with other bakeshops) and the most talented sales team (to push our products to the coffee shops). In this notebook, using KNN clusters and a map, we will find neighborhoods with the highest concentration of coffee shops (potential clients) and find a location that will be the closest for distrubition of the goods. I will begin with Toronto first:

In [54]:
#Import Packages
import numpy as np
import pandas as pd
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium=0.5.0
import folium

In [55]:
#Obtain and clean location data
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M') # download data
df = df[0]# Select appropriate index for data
df = df[df['Borough']!='Not assigned'] # Remove all boroughs that don't have an assigned zip code
df['Borough'].unique() # checking no other data anomilies exist
df = df.groupby(['Postcode','Borough'], as_index = False, sort = False).agg(','.join) # Comebine neighborhoods within the same zip code and Borough
df['Neighbourhood'] = np.where(df["Neighbourhood"] == 'Not assigned', df['Borough'], df['Neighbourhood']) # Missing values in Neighborhood replaced with Borough.
df.shape
#Add longitude and latitude
df2 = pd.read_csv('http://cocl.us/Geospatial_data')
df = pd.merge(df,df2,how = 'left', left_on=df['Postcode'], right_on=df2['Postal Code'])
df = df[['Postcode','Borough', 'Neighbourhood', 'Latitude','Longitude']]
df.head(5)
#Create a map of toronto
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="Toronto_Location")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [56]:
#Select the most concentrated are in Toronto (Downtown Toronto)

df_toronto = df[df['Borough'] == 'Downtown Toronto']
address = 'Downtown Toronto, Toronto'

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronro are {}, {}.'.format(latitude, longitude))

#Create a map of the neighborhooods in downtown Toronto
# create map of Toronto using latitude and longitude values
Dtown_Toronto_map = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Dtown_Toronto_map)  
    
Dtown_Toronto_map

The geograpical coordinate of Downtown Toronro are 43.6541737, -79.3808116451341.


In [57]:
#FOursquare credentials
CLIENT_ID = 'X' # your Foursquare ID
CLIENT_SECRET = 'X' # your Foursquare Secret
VERSION = 'X' # Foursquare API version

In [58]:
#Obtain top 20 Venues within 500 meters
LIMIT = 20 #limit for venues defined
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
toronto_venues = getNearbyVenues(names=df_toronto['Neighbourhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )
#One hot encoding of the obtained data
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head(3)

#Most common 20 venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 20

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

#Cluster Neighborhoods N=4
# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)


# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged = toronto_merged.dropna() # drop any neighborhoods that didnt have any data

toronto_merged.head()

Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Christie
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown,St. James Town
First Canadian Place,Underground city
Church and Wellesley


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636,2,Coffee Shop,Breakfast Spot,Gym / Fitness Center,Bakery,...,Historic Site,Performing Arts Venue,Chocolate Shop,Concert Hall,College Gym,Cocktail Bar,Clothing Store,Comfort Food Restaurant,Church,Comic Shop
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,1,Café,Clothing Store,Comic Shop,Sandwich Place,...,Ramen Restaurant,Beer Bar,Music Venue,Theater,Diner,Thai Restaurant,Taco Place,Tea Room,Deli / Bodega,Cocktail Bar
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Gastropub,Coffee Shop,Restaurant,Japanese Restaurant,...,Hotel,American Restaurant,Food Truck,Café,Gym,Convenience Store,Comic Shop,College Gym,Concert Hall,Cocktail Bar
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Farmers Market,Beer Bar,Park,Basketball Stadium,...,Cocktail Bar,Fish Market,Tea Room,Thai Restaurant,Museum,French Restaurant,Vegetarian / Vegan Restaurant,Bistro,Clothing Store,College Gym
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,2,Coffee Shop,Bubble Tea Shop,Spa,Park,...,Italian Restaurant,Seafood Restaurant,Tea Room,Deli / Bodega,Comfort Food Restaurant,Chocolate Shop,Church,French Restaurant,Fountain,Clothing Store


In [59]:
#Create a cluster map
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [60]:
#Examination of CLusters #1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
15,St. James Town,0,Gastropub,Coffee Shop,Restaurant,Japanese Restaurant,BBQ Joint,Creperie,Middle Eastern Restaurant,Church,...,Hotel,American Restaurant,Food Truck,Café,Gym,Convenience Store,Comic Shop,College Gym,Concert Hall,Cocktail Bar
25,Christie,0,Grocery Store,Café,Park,Italian Restaurant,Athletics & Sports,Restaurant,Coffee Shop,Diner,...,Nightclub,Cocktail Bar,Concert Hall,Clothing Store,Gastropub,Church,College Gym,Comfort Food Restaurant,Comic Shop,Fried Chicken Joint
42,"Design Exchange,Toronto Dominion Centre",0,Coffee Shop,Café,Restaurant,Deli / Bodega,Gastropub,Hotel,Pub,Japanese Restaurant,...,Fried Chicken Joint,American Restaurant,Gym,Gym / Fitness Center,Food Court,Convenience Store,Cocktail Bar,French Restaurant,College Gym,Comfort Food Restaurant
48,"Commerce Court,Victoria Hotel",0,Café,Gastropub,Coffee Shop,Restaurant,Bakery,Museum,Pub,Deli / Bodega,...,Gym,American Restaurant,Gym / Fitness Center,Fish Market,Concert Hall,Cocktail Bar,Fried Chicken Joint,French Restaurant,College Gym,Comfort Food Restaurant
84,"Chinatown,Grange Park,Kensington Market",0,Café,Vietnamese Restaurant,Mexican Restaurant,Wine Bar,Bakery,Farmers Market,Dessert Shop,Coffee Shop,...,Caribbean Restaurant,Bar,Cheese Shop,Gourmet Shop,Vegetarian / Vegan Restaurant,Arts & Crafts Store,Airport Gate,Comic Shop,Concert Hall,Airport Lounge
97,"First Canadian Place,Underground city",0,Café,Coffee Shop,Restaurant,Steakhouse,Pizza Place,Seafood Restaurant,Pub,Bar,...,Food Court,American Restaurant,Gym / Fitness Center,Gym,Convenience Store,Comfort Food Restaurant,Comic Shop,Cocktail Bar,Clothing Store,Concert Hall


In [61]:
#Examination of CLusters #2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
9,"Ryerson,Garden District",1,Café,Clothing Store,Comic Shop,Sandwich Place,Movie Theater,Burrito Place,Burger Joint,Pizza Place,...,Ramen Restaurant,Beer Bar,Music Venue,Theater,Diner,Thai Restaurant,Taco Place,Tea Room,Deli / Bodega,Cocktail Bar
20,Berczy Park,1,Farmers Market,Beer Bar,Park,Basketball Stadium,Seafood Restaurant,Breakfast Spot,Liquor Store,Steakhouse,...,Cocktail Bar,Fish Market,Tea Room,Thai Restaurant,Museum,French Restaurant,Vegetarian / Vegan Restaurant,Bistro,Clothing Store,College Gym
30,"Adelaide,King,Richmond",1,Steakhouse,Plaza,Coffee Shop,Seafood Restaurant,Bar,Hotel,Speakeasy,Pizza Place,...,Opera House,American Restaurant,Café,Greek Restaurant,Concert Hall,Food Court,Neighborhood,Vegetarian / Vegan Restaurant,Lounge,College Gym
36,"Harbourfront East,Toronto Islands,Union Station",1,Café,Park,Italian Restaurant,Hotel,Salad Place,Plaza,Bistro,Bubble Tea Shop,...,Ice Cream Shop,Bakery,Sporting Goods Shop,Supermarket,Performing Arts Venue,Deli / Bodega,New American Restaurant,Neighborhood,Comfort Food Restaurant,Coffee Shop
80,"Harbord,University of Toronto",1,Bookstore,Restaurant,Bakery,Japanese Restaurant,Sandwich Place,Beer Bar,Bar,Italian Restaurant,...,Comfort Food Restaurant,College Gym,Café,Theater,French Restaurant,Sushi Restaurant,Comic Shop,Coffee Shop,Cocktail Bar,Concert Hall
87,"CN Tower,Bathurst Quay,Island airport,Harbourf...",1,Airport Service,Airport Lounge,Airport Terminal,Airport,Harbor / Marina,Sculpture Garden,Boat or Ferry,Bar,...,Airport Gate,Airport Food Court,College Gym,Comfort Food Restaurant,Dessert Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Dance Studio
92,Stn A PO Boxes 25 The Esplanade,1,Café,Farmers Market,Cocktail Bar,Museum,Clothing Store,Concert Hall,Beer Bar,Jazz Club,...,Food Truck,Vegetarian / Vegan Restaurant,French Restaurant,Fountain,Tea Room,Thai Restaurant,Tailor Shop,Comfort Food Restaurant,Chinese Restaurant,Chocolate Shop
96,"Cabbagetown,St. James Town",1,Café,Restaurant,General Entertainment,Gift Shop,Diner,Deli / Bodega,Indian Restaurant,Italian Restaurant,...,Caribbean Restaurant,Park,Butcher,Pet Store,Pub,Gastropub,Taiwanese Restaurant,Bakery,Basketball Stadium,Dance Studio
99,Church and Wellesley,1,Burger Joint,Pub,Restaurant,Bookstore,Salon / Barbershop,Juice Bar,Breakfast Spot,Japanese Restaurant,...,Creperie,Dance Studio,Park,Diner,Mexican Restaurant,Tea Room,Theme Restaurant,General Entertainment,Gastropub,Ramen Restaurant


In [62]:
#Examination of CLusters #3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
2,"Harbourfront,Regent Park",2,Coffee Shop,Breakfast Spot,Gym / Fitness Center,Bakery,Park,Restaurant,Spa,Pub,...,Historic Site,Performing Arts Venue,Chocolate Shop,Concert Hall,College Gym,Cocktail Bar,Clothing Store,Comfort Food Restaurant,Church,Comic Shop
24,Central Bay Street,2,Coffee Shop,Bubble Tea Shop,Spa,Park,Sushi Restaurant,Modern European Restaurant,Gastropub,Ramen Restaurant,...,Italian Restaurant,Seafood Restaurant,Tea Room,Deli / Bodega,Comfort Food Restaurant,Chocolate Shop,Church,French Restaurant,Fountain,Clothing Store


In [63]:
#Examination of CLusters #4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
91,Rosedale,3,Park,Trail,Playground,Building,Cosmetics Shop,Chinese Restaurant,Chocolate Shop,Church,...,Coffee Shop,College Gym,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Wine Bar,Cheese Shop,Dance Studio,Deli / Bodega


# After reviewiwng map and data, I would suggest to explore cluster 1 and cluster 3 for further exploration. These areas have coffee shops and cafes as the most visited venues in the areas. From there, strategical points can be selected which are the most convenient for deliveries of goods.

# New York location exploration

In [64]:
#Obtain and clean location data
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [65]:
#Create map of NY data
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

## Staten Island area looks like a good concentrated area, we will select this area.

In [66]:
si_data = neighborhoods[neighborhoods['Borough'] == 'Staten Island'].reset_index(drop=True)
si_data.head()


#latitude = 40.571944
#longitude = -74.146944
#si_data = neighborhood[neighborhood['Borough'] == 'Staten Island']
address = 'Staten Island, New York'

geolocator = Nominatim(user_agent="NewYork")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Staten Island are {}, {}.'.format(latitude, longitude))

#Create a map of Staten Island

# create map of Manhattan using latitude and longitude values
map_si = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(si_data['Latitude'], si_data['Longitude'], si_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_si)  
    
map_si



The geograpical coordinate of Staten Island are 40.5834557, -74.1496048.


In [67]:
#Obtain top 20 Venues within 500 meters

LIMIT = 20 #limit for venues defined
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
si_venues = getNearbyVenues(names=si_data['Neighborhood'],
                                   latitudes=si_data['Latitude'],
                                   longitudes=si_data['Longitude']
                                  )
#One hot encoding of the obtained data
# one hot encoding
si_onehot = pd.get_dummies(si_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
si_onehot['Neighborhood'] = si_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [si_onehot.columns[-1]] + list(si_onehot.columns[:-1])
si_onehot = si_onehot[fixed_columns]

si_grouped = si_onehot.groupby('Neighborhood').mean().reset_index()
si_grouped.head(3)

#Most common 20 venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 20

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = si_grouped['Neighborhood']

for ind in np.arange(si_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(si_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

#Cluster Neighborhoods N=4
# set number of clusters
kclusters = 4

si_grouped_clustering = si_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(si_grouped_clustering)


# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

si_merged = si_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
si_merged = si_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

si_merged = si_merged.dropna() # drop any neighborhoods that didnt have any data

si_merged.head()

St. George
New Brighton
Stapleton
Rosebank
West Brighton
Grymes Hill
Todt Hill
South Beach
Port Richmond
Mariner's Harbor
Port Ivory
Castleton Corners
New Springville
Travis
New Dorp
Oakwood
Great Kills
Eltingville
Annadale
Woodrow
Tottenville
Tompkinsville
Silver Lake
Sunnyside
Park Hill
Westerleigh
Graniteville
Arlington
Arrochar
Grasmere
Old Town
Dongan Hills
Midland Beach
Grant City
New Dorp Beach
Bay Terrace
Huguenot
Pleasant Plains
Butler Manor
Charleston
Rossville
Arden Heights
Greenridge
Heartland Village
Chelsea
Bloomfield
Bulls Head
Richmond Town
Shore Acres
Clifton
Concord
Emerson Hill
Randall Manor
Howland Hook
Elm Park
Manor Heights
Willowbrook
Sandy Ground
Egbertville
Prince's Bay
Lighthouse Hill
Richmond Valley
Fox Hills


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Staten Island,St. George,40.644982,-74.079353,0.0,Clothing Store,Italian Restaurant,Bar,Tapas Restaurant,Art Gallery,...,Pizza Place,Plaza,Steakhouse,American Restaurant,Toy / Game Store,Theater,Tex-Mex Restaurant,Dog Run,Fast Food Restaurant,Falafel Restaurant
1,Staten Island,New Brighton,40.640615,-74.087017,1.0,Bus Stop,Park,Deli / Bodega,Bowling Alley,Discount Store,...,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Event Space,Falafel Restaurant,Dim Sum Restaurant,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop
2,Staten Island,Stapleton,40.626928,-74.077902,0.0,Discount Store,Sandwich Place,Mexican Restaurant,Sri Lankan Restaurant,Beer Bar,...,Skate Park,Fast Food Restaurant,Spanish Restaurant,Pizza Place,Asian Restaurant,Bank,Bar,Gourmet Shop,Hardware Store,Electronics Store
3,Staten Island,Rosebank,40.615305,-74.069805,0.0,Italian Restaurant,Pizza Place,Grocery Store,Bagel Shop,Mexican Restaurant,...,Breakfast Spot,Restaurant,Cosmetics Shop,Ice Cream Shop,Video Store,Discount Store,Filipino Restaurant,Fast Food Restaurant,Flower Shop,Dry Cleaner
4,Staten Island,West Brighton,40.631879,-74.107182,0.0,Coffee Shop,Italian Restaurant,Music Store,Taco Place,Bagel Shop,...,German Restaurant,Ice Cream Shop,Juice Bar,Mexican Restaurant,American Restaurant,Wings Joint,Donut Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant


In [73]:
#Create a cluster map
# create map
map_si = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(si_merged['Latitude'], si_merged['Longitude'], si_merged['Neighborhood'], si_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_si)
       
map_si

In [77]:
#Examination of CLusters #1
si_merged.loc[si_merged['Cluster Labels'] == 0, si_merged.columns[[1] + list(range(5, si_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,St. George,Clothing Store,Italian Restaurant,Bar,Tapas Restaurant,Art Gallery,Baseball Stadium,Deli / Bodega,Donut Shop,Hot Dog Joint,...,Pizza Place,Plaza,Steakhouse,American Restaurant,Toy / Game Store,Theater,Tex-Mex Restaurant,Dog Run,Fast Food Restaurant,Falafel Restaurant
2,Stapleton,Discount Store,Sandwich Place,Mexican Restaurant,Sri Lankan Restaurant,Beer Bar,Park,Residential Building (Apartment / Condo),Restaurant,Donut Shop,...,Skate Park,Fast Food Restaurant,Spanish Restaurant,Pizza Place,Asian Restaurant,Bank,Bar,Gourmet Shop,Hardware Store,Electronics Store
3,Rosebank,Italian Restaurant,Pizza Place,Grocery Store,Bagel Shop,Mexican Restaurant,Cajun / Creole Restaurant,Eastern European Restaurant,Donut Shop,Sandwich Place,...,Breakfast Spot,Restaurant,Cosmetics Shop,Ice Cream Shop,Video Store,Discount Store,Filipino Restaurant,Fast Food Restaurant,Flower Shop,Dry Cleaner
4,West Brighton,Coffee Shop,Italian Restaurant,Music Store,Taco Place,Bagel Shop,Bar,Board Shop,Burger Joint,Café,...,German Restaurant,Ice Cream Shop,Juice Bar,Mexican Restaurant,American Restaurant,Wings Joint,Donut Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant
7,South Beach,Deli / Bodega,Pier,Athletics & Sports,Beach,Yoga Studio,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,...,Flower Shop,Filipino Restaurant,Event Space,Falafel Restaurant,Gas Station,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run
9,Mariner's Harbor,Italian Restaurant,Deli / Bodega,Ice Cream Shop,Bus Stop,Nightlife Spot,Athletics & Sports,Other Repair Shop,Fast Food Restaurant,Food & Drink Shop,...,Flower Shop,Filipino Restaurant,Event Space,Falafel Restaurant,French Restaurant,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run
11,Castleton Corners,Pizza Place,Bank,Deli / Bodega,Japanese Restaurant,Hardware Store,Grocery Store,Mini Golf,Sandwich Place,Burger Joint,...,Ice Cream Shop,Bagel Shop,Bakery,Filipino Restaurant,Gas Station,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food
12,New Springville,Health & Beauty Service,Bagel Shop,Ice Cream Shop,Liquor Store,Mobile Phone Shop,Shopping Mall,Bookstore,Coffee Shop,Mexican Restaurant,...,Grocery Store,Soup Place,Donut Shop,Restaurant,Pharmacy,Deli / Bodega,Pizza Place,Hookah Bar,Flower Shop,Dog Run
13,Travis,Hotel,Bowling Alley,Deli / Bodega,Comedy Club,Spanish Restaurant,Park,Gym,Baseball Field,Sports Club,...,Donut Shop,Electronics Store,Event Space,Eastern European Restaurant,Dry Cleaner,Dog Run,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Flower Shop
14,New Dorp,Italian Restaurant,Pizza Place,Bank,Hobby Shop,Mexican Restaurant,Dim Sum Restaurant,Dessert Shop,Deli / Bodega,Salon / Barbershop,...,Coffee Shop,Sushi Restaurant,Indian Restaurant,Bagel Shop,Vietnamese Restaurant,Bakery,Bar,Flower Shop,Furniture / Home Store,French Restaurant


In [79]:
#Examination of CLusters #2
si_merged.loc[si_merged['Cluster Labels'] == 1, si_merged.columns[[1] + list(range(5, si_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
1,New Brighton,Bus Stop,Park,Deli / Bodega,Bowling Alley,Discount Store,Convenience Store,Playground,Food Truck,Food & Drink Shop,...,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Event Space,Falafel Restaurant,Dim Sum Restaurant,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop
8,Port Richmond,Deli / Bodega,Bus Stop,Rental Car Location,Donut Shop,Pizza Place,Mexican Restaurant,Falafel Restaurant,Food & Drink Shop,Food,...,Filipino Restaurant,Fast Food Restaurant,Electronics Store,Event Space,French Restaurant,Eastern European Restaurant,Dry Cleaner,Dog Run,Discount Store,Diner
15,Oakwood,Women's Store,Bar,Bus Stop,Playground,French Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop,...,Fast Food Restaurant,Falafel Restaurant,Gas Station,Event Space,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run,Discount Store
24,Park Hill,Bus Stop,Hotel,Athletics & Sports,Gym / Fitness Center,Coffee Shop,Yoga Studio,Falafel Restaurant,Food Truck,Food & Drink Shop,...,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Event Space,Furniture / Home Store,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run
27,Arlington,Bus Stop,American Restaurant,Boat or Ferry,Grocery Store,Coffee Shop,Fast Food Restaurant,Food Truck,Food & Drink Shop,Food,...,Filipino Restaurant,Falafel Restaurant,Furniture / Home Store,Event Space,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run,Discount Store
29,Grasmere,Bus Stop,Grocery Store,Italian Restaurant,Bank,Japanese Restaurant,Nail Salon,Park,Pharmacy,Ice Cream Shop,...,Cosmetics Shop,Deli / Bodega,Bakery,Vegetarian / Vegan Restaurant,Bagel Shop,Flower Shop,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop
32,Midland Beach,Bus Stop,Beach,Deli / Bodega,Basketball Court,Restaurant,Pet Store,Café,Bookstore,Dessert Shop,...,Health & Beauty Service,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Event Space,Hotel,Electronics Store
38,Butler Manor,Pool,Baseball Field,Convenience Store,Bus Stop,Yoga Studio,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,...,Flower Shop,Filipino Restaurant,Falafel Restaurant,Gas Station,Event Space,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run
41,Arden Heights,Coffee Shop,Bus Stop,Pizza Place,Lawyer,Pharmacy,Falafel Restaurant,Food Truck,Food & Drink Shop,Food,...,Filipino Restaurant,Fast Food Restaurant,Yoga Studio,Event Space,Furniture / Home Store,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run
42,Greenridge,Diner,Pub,Playground,Pizza Place,Bus Stop,Bowling Alley,Fast Food Restaurant,Food Truck,Food & Drink Shop,...,Flower Shop,Filipino Restaurant,Electronics Store,Falafel Restaurant,Event Space,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run,Discount Store


In [80]:
#Examination of CLusters #3
si_merged.loc[si_merged['Cluster Labels'] == 2, si_merged.columns[[1] + list(range(5, si_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
6,Todt Hill,Park,Yoga Studio,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant,...,Event Space,Intersection,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run,Discount Store,Diner,Dim Sum Restaurant


In [81]:
#Examination of CLusters #4
si_merged.loc[si_merged['Cluster Labels'] == 3, si_merged.columns[[1] + list(range(5, si_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
5,Grymes Hill,Moving Target,Dog Run,Gym,Yoga Studio,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food,...,Filipino Restaurant,Falafel Restaurant,Gas Station,Event Space,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Discount Store,Diner
22,Silver Lake,American Restaurant,Burger Joint,Golf Course,Gym,Furniture / Home Store,Food Truck,Food & Drink Shop,Food,Flower Shop,...,Fast Food Restaurant,Falafel Restaurant,Event Space,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run,Discount Store,Diner
23,Sunnyside,American Restaurant,Theater,Spa,Grocery Store,Gym,Market,Food & Drink Shop,Food,Flower Shop,...,Fast Food Restaurant,Falafel Restaurant,Event Space,French Restaurant,Electronics Store,Eastern European Restaurant,Dry Cleaner,Donut Shop,Dog Run,Discount Store


# After reviewiwng map and data for Staten Island, I would suggest to explore cluster 2 due to high volume of food/breakfast/deli locations. Coffee shops in Staten Island are not as popular as they are in Toronto, therefore I would suggest to possibly avoid Staten Island all-together. In addition, due to popularity of local delis and donut/bagel shops, the market seems to be oversaturated with baked good that are produced fresh at the location of the business, therefore I would forecast low demand of baked goods produced by our bakery.


# Conclusion: Avoid Staten Island as a possible exapnsion area for out bakery, and concentrate on expanding in downtown Toronto area. 