In this project I assumed that I decided to open an Italian restaurant.  
To decide on the neighborhood for the restaurant, I need to find the locations    
of all the restaurants from all neighborhoods using foursquare, and then  
cluster the neighborhoods according to the popularity of the present restaurant types.  
Let's see the neighborhood alternatives for my restaurant!

In [None]:
import pandas as pd
import numpy as np

In [None]:
#Read table
tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
df = tables[0]

#Remove all not assigned labels
df = df[df['Borough'] != 'Not assigned']

#Combine multiple entires
df = df.groupby(['Postal Code','Borough'])['Neighborhood'].apply(', '.join).reset_index()
df

In [None]:
#Read latitude and logtitude from excel file
location_df = pd.read_csv('Geospatial_Coordinates.csv')
location_df

In [None]:
df = df.sort_values(by = 'Postal Code')
location_df = location_df.sort_values(by = 'Postal Code')
df = df.merge(location_df)
df

Let's find the latitude and longitude of Toronto

In [None]:
from geopy.geocoders import Nominatim
import folium

address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

Draw the map of neighborhoods of Toronto

In [None]:
# create map of Toronto
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='#003300',
        fill=True,
        fill_color='#009933',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Using foursquare credentials we need to take nearby Venues

In [None]:
CLIENT_ID = 'ED50LRDOI5KI2IFP54RPTUJHNPMVF0ZHJV5RGKCGV0T1KXUG' # your Foursquare ID
CLIENT_SECRET = 'HMHZURR30PUCHCJZKWLDAJ1TSD25VGQJTWQDCALGS2MV3MI1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
import requests

#For each neighborhood, query for all nearby venues 
def getNearbyVenues(names, latitudes, longitudes, radius= 1000, LIMIT = 1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'])


In [None]:
#List of all toronto venues
toronto_venues

We only need the rows having Restaurant in venue category

In [None]:
#Select only restaurants
selected_samples = []
for i in range(toronto_venues.shape[0]):
    if 'Restaurant' in toronto_venues['Venue Category'].values[i]:
        selected_samples.append(i)
toronto_venues = toronto_venues.iloc[selected_samples]
toronto_venues


In order to be able to cluster we need to do one hot encoding

In [None]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()


Now let's calculate the average number of restaurants in a neighborhood for each category

In [None]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Now it's time to cluster the neighborhoods by using kmeans, suppose the number of clusters is 10

In [None]:
#Cluster neighborhoods
from sklearn.cluster import KMeans

# run k-means clustering
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=10, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

In [None]:
#Combine cluster labels and top restaurant with dataframe
df = df.sort_values(by = 'Neighborhood')
final_df = pd.concat([toronto_grouped['Neighborhood'], pd.DataFrame(kmeans.labels_, columns = ['Cluster Label'])], axis = 1)

top_restaurants = toronto_grouped.columns[1:][np.argmax(toronto_grouped.values[:, 1:], axis = 1)]
final_df = pd.concat([final_df, pd.DataFrame(top_restaurants, columns = ['Most Common Restaurant'])], axis = 1)

final_df = pd.merge(final_df, df, on = 'Neighborhood')
final_df

Draw the map of clusters

In [None]:
#from geopy.geocoders import Nominatim
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(10)
ys = [i + x + (i*x)**2 for i in range(10)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(final_df['Latitude'], final_df['Longitude'], final_df['Neighborhood'], final_df['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Let's look at the dataframes of all the clusters one by one

In [None]:
final_df.loc[final_df['Cluster Label'] == 0]

In [None]:
final_df.loc[final_df['Cluster Label'] == 1]

In [None]:
final_df.loc[final_df['Cluster Label'] == 2]

In [None]:
final_df.loc[final_df['Cluster Label'] == 3]

In [None]:
final_df.loc[final_df['Cluster Label'] == 4]

In [None]:
final_df.loc[final_df['Cluster Label'] == 5]

In [None]:
final_df.loc[final_df['Cluster Label'] == 6]

In [None]:
final_df.loc[final_df['Cluster Label'] == 7]

In [None]:
final_df.loc[final_df['Cluster Label'] == 8]

In [None]:
final_df.loc[final_df['Cluster Label'] == 9]

As we see cluster 2 is ideal for my restaurant

In [None]:
final_df.loc[final_df['Cluster Label'] == 2]['Neighborhood'].values

You see I can open my restaurant at, Humber Summit, Old Mill South, King's Mill Park, Sunnylea, Humber Bay, Mimico NE, The Queensway East, Royal York South East, Kingsway Park South East",Rouge Hill, Port Union or  
Highland Creek, to have a chance of more customers