# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Predicting the Best Location for Business Facility

### Introduction: Business Problem

A multi-national company wants to set up the manufacturing facility for Restaurant Equipment and Supply in the neighborhood of Toronto. The company’s management wants the perfect location which is closer to restaurant of diverse types. They want diverse customers to get more orders. 
To meet their requirements, they gave the data scientist a task to find the neighborhood with diverse restaurant types. 
Data Scientist team gathered the data and perform K-means Clustering to give prediction on the diversity of the neighborhood according to the Postal Codes of each neighborhood. 


### Data

Data downloaded or scraped from Wikipedia were combined into one table. There were missing values in Borough and Neighborhood. Those rows were dropped. Second data which was possessed through Foursquare location search, was used to explore the restaurants in neighborhood of Toronto. 
Data feature of ‘Postal Code’ was used to combine the two table from Wikipedia and Foursquare location to get diverse types of restaurant according to the ‘Postal Code’ of the neighborhood. Top ten nearest restaurant were listed according to Longitudinal and Latitudinal axis of the location of the Neighborhood. 


Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Download and Explore Dataset

Scrape the Wikipedia page and transform the data in the table into a pandas dataframe.

In [2]:
url= 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url)
df1 = df[0]
df1

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


Remove the missing values. In dataframe 'Borough' and 'Neighbourhood' columns have 'Not assigned' values. With drop function unnecessary rows are removed. 

In [3]:
df1.replace("Not assigned", np.nan, inplace = True)
df1.dropna(subset=["Borough"], axis=0, inplace=True)

# reset index, because we droped two rows
df1.reset_index(drop=True, inplace=True)

# Set it None to display all rows in the dataframe
pd.set_option('display.max_rows', None)
display(df1)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


Download the Longitudinal and Latitudinal axis of the 'Borough'. Then merge into the scraped and cleaned dataframe downloaded from Wikipedia. This will give more detail about the Postal Codes of the Toronto city. 

In [4]:
url1= 'http://cocl.us/Geospatial_data'
df_lo_lt = pd.read_csv(url1)

In [5]:
df_r = pd.merge(df1, df_lo_lt, how = 'left', on =['Postal Code']) #concatenating the two data frame
df_r

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [6]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df_r['Borough'].unique()),
        df_r.shape[0]
    )
)

The dataframe has 10 boroughs and 103 neighborhoods.


#### Convert the address to the longitudinal and latitudinal axis

In [7]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Create the map of Postal Codes in Toronto city

In [8]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_r['Latitude'],df_r['Longitude'], df_r['Postal Code']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)     
    
map_downtown

#### Foursquare

Now that we have our location of Postal Codes and Neighbourhood, let's use Foursquare API to get info on restaurants in each neighbourhood.

We're are interested in all types of restaurants and more specifically in diversity of Neighbourhood restaurants type. We search in radius of 10 km for different types of restaurant. 

In [9]:
CLIENT_ID = 'SNJBEC2WZKSJJXJKJWIJD2U1MDKQ2VG2KIYTGYCVLZ3R524O' # your Foursquare ID
CLIENT_SECRET = 'AZXQJVHRX4RUYPUMJITNDPQWUV3MNPA5YS51S3T323XDLDXP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


In [10]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius =10000 # define radius
search_query = 'Restaurant'

url2 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)

url2

'https://api.foursquare.com/v2/venues/search?client_id=SNJBEC2WZKSJJXJKJWIJD2U1MDKQ2VG2KIYTGYCVLZ3R524O&client_secret=AZXQJVHRX4RUYPUMJITNDPQWUV3MNPA5YS51S3T323XDLDXP&ll=43.6534817,-79.3839347&v=20180605&query=Restaurant&radius=10000&limit=100'

In [11]:
results = requests.get(url2).json()
results

{'meta': {'code': 200, 'requestId': '5f43f0260c8de740f7a6135a'},
 'response': {'venues': [{'id': '4ad4c05ff964a52048f720e3',
    'name': 'Hemispheres Restaurant & Bistro',
    'location': {'address': '110 Chestnut Street',
     'lat': 43.65488413420439,
     'lng': -79.38593077371578,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.65488413420439,
       'lng': -79.38593077371578}],
     'distance': 224,
     'postalCode': 'M5G 1R3',
     'cc': 'CA',
     'city': 'Toronto',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['110 Chestnut Street',
      'Toronto ON M5G 1R3',
      'Canada']},
    'categories': [{'id': '4bf58dd8d48988d14e941735',
      'name': 'American Restaurant',
      'pluralName': 'American Restaurants',
      'shortName': 'American',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/default_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1598288356',
    'hasPerk': False},
   {'id': '

We extract the necessary data and clean it according to our needs. For example we are more specifically interested Postal Code, name of the restaurant, categories and neighbourhood with longitudinal and latitudinal coordinates, remove other unnecessary data. 

In [12]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

  """


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,venuePage.id,location.neighborhood
0,4ad4c05ff964a52048f720e3,Hemispheres Restaurant & Bistro,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",v-1598288356,False,110 Chestnut Street,43.654884,-79.385931,"[{'label': 'display', 'lat': 43.65488413420439...",224,M5G 1R3,CA,Toronto,ON,Canada,"[110 Chestnut Street, Toronto ON M5G 1R3, Canada]",,,
1,4ad4c05cf964a52006f620e3,Victoria's Restaurant,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",v-1598288356,False,37 King Street East,43.649298,-79.376431,"[{'label': 'display', 'lat': 43.64929834396347...",763,M5C 1E9,CA,Toronto,ON,Canada,[37 King Street East (at Le Meridien King Edwa...,at Le Meridien King Edward Hotel,498556908.0,
2,4ada5d5bf964a520e92121e3,The Hot House Restaurant & Bar,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",v-1598288356,False,35 Church St,43.648824,-79.373702,"[{'label': 'display', 'lat': 43.64882370529773...",973,M5E 1T3,CA,Toronto,ON,Canada,"[35 Church St (at Front St E), Toronto ON M5E ...",at Front St E,,
3,4b072e9df964a52009f922e3,Sky Dragon Chinese Restaurant 龍翔酒樓,"[{'id': '4bf58dd8d48988d1f5931735', 'name': 'D...",v-1598288356,False,280 Spadina Ave.,43.652783,-79.398174,"[{'label': 'display', 'lat': 43.65278331265585...",1149,,CA,Toronto,ON,Canada,"[280 Spadina Ave. (at Dundas St. W.), Toronto ...",at Dundas St. W.,,
4,4ad4c060f964a5207ff720e3,Rol San Restaurant 龍笙棧,"[{'id': '4bf58dd8d48988d1f5931735', 'name': 'D...",v-1598288356,False,323 Spadina Ave.,43.654318,-79.39865,"[{'label': 'display', 'lat': 43.65431754076345...",1188,M5T 2E9,CA,Toronto,ON,Canada,"[323 Spadina Ave. (at D'Arcy St.), Toronto ON ...",at D'Arcy St.,,Kensington Market


In [13]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.drop(['labeledLatLngs', 'distance',  'cc', 'city', 'state', 'country', 'formattedAddress', 'crossStreet', 'neighborhood', 'id'], axis =1, inplace = True)
dataframe_filtered.dropna(subset=["postalCode"], axis=0, inplace=True)



In [14]:
postal =dataframe_filtered['postalCode'].tolist()
def three_digits(a):
    p_c=[]
    for i in range(len(a)):
        p = a[i][0:3]
        p_c.append(p)
    return p_c
l_p = three_digits(postal)
dataframe_filtered.insert(0, "Postal Code" ,l_p, True)  
dataframe_filtered.drop(['postalCode'], axis=1, inplace=True)
nearby_venues = dataframe_filtered 

In [15]:
print('There are {} uniques categories.'.format(len(nearby_venues['categories'].unique())))

There are 19 uniques categories.


### Analyze Each Postal Code

In [16]:
# one hot encoding
downtown_onehot = pd.get_dummies(nearby_venues[['categories']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_onehot['Postal Code'] = nearby_venues['Postal Code'] 

# move neighborhood column to the first column
fixed_columns = [downtown_onehot.columns[-1]] + list(downtown_onehot.columns[:-1])
downtown_onehot = downtown_onehot[fixed_columns]

downtown_onehot

Unnamed: 0,Postal Code,American Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space,Indian Restaurant,Italian Restaurant,Korean Restaurant,New American Restaurant,Nightclub,Noodle House,Restaurant,Thai Restaurant,Theme Restaurant,Vietnamese Restaurant,Wine Bar
0,M5G,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,M5C,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,M5E,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,M5T,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
5,M5V,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
7,M5T,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,M5T,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
9,M5T,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10,M5B,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
11,M5V,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


In [17]:
downtown_grouped = downtown_onehot.groupby('Postal Code').mean().reset_index()
downtown_grouped

Unnamed: 0,Postal Code,American Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space,Indian Restaurant,Italian Restaurant,Korean Restaurant,New American Restaurant,Nightclub,Noodle House,Restaurant,Thai Restaurant,Theme Restaurant,Vietnamese Restaurant,Wine Bar
0,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,M4W,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,M4X,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
4,M5B,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M5C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
6,M5E,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,M5G,0.125,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.125,0.125,0.0
8,M5H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.2,0.0,0.0,0.4,0.0,0.0,0.0,0.0
9,M5R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Gives 10 most common or nearest venues in the given Postal Code location

In [19]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postal Code'] = downtown_grouped['Postal Code']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4M,Vietnamese Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
1,M4W,Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
2,M4X,Chinese Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Dim Sum Restaurant,Diner,Event Space,Italian Restaurant
3,M4Y,Thai Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
4,M5B,Breakfast Spot,Diner,Wine Bar,Indian Restaurant,Bar,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Event Space,Italian Restaurant


#### Top 3 nearest Venues/ Restaurant in every Postal Codes.

In [20]:
num_top_venues = 3

for hood in downtown_grouped['Postal Code']:
    print("----"+hood+"----")
    temp = downtown_grouped[downtown_grouped['Postal Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M4M----
                   venue  freq
0  Vietnamese Restaurant   1.0
1    American Restaurant   0.0
2      Korean Restaurant   0.0


----M4W----
                 venue  freq
0           Restaurant   1.0
1  American Restaurant   0.0
2    Korean Restaurant   0.0


----M4X----
                 venue  freq
0   Chinese Restaurant   1.0
1  American Restaurant   0.0
2    Korean Restaurant   0.0


----M4Y----
                 venue  freq
0      Thai Restaurant   1.0
1  American Restaurant   0.0
2    Korean Restaurant   0.0


----M5B----
                 venue  freq
0       Breakfast Spot   0.5
1                Diner   0.5
2  American Restaurant   0.0


----M5C----
                 venue  freq
0           Restaurant   1.0
1  American Restaurant   0.0
2    Korean Restaurant   0.0


----M5E----
                   venue  freq
0    American Restaurant   1.0
1      Korean Restaurant   0.0
2  Vietnamese Restaurant   0.0


----M5G----
                 venue  freq
0   Chinese Restaurant  0.38
1  A

### Postal Codes Cluster 

In [21]:
# set number of clusters
kclusters =5

downtown_grouped_clustering = downtown_grouped.drop('Postal Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 0, 3, 0, 1, 0, 0, 0, 2])

In [22]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

downtown_merged = df_r

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Postal Code'), on='Postal Code')
downtown_merged.dropna(subset=['Cluster Labels'], axis=0, inplace=True)

# reset index, because we droped two rows
downtown_merged.reset_index(drop=True, inplace=True)
conert_dict = {'Cluster Labels': int}
downtown_merged = downtown_merged.astype(conert_dict)
downtown_merged # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Breakfast Spot,Diner,Wine Bar,Indian Restaurant,Bar,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Event Space,Italian Restaurant
1,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
2,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,American Restaurant,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space,Wine Bar
3,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Chinese Restaurant,Italian Restaurant,Restaurant,Vietnamese Restaurant,American Restaurant,Theme Restaurant,Noodle House,Nightclub,New American Restaurant,Korean Restaurant
4,M6G,Downtown Toronto,Christie,43.669542,-79.422564,4,Korean Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
5,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0,Restaurant,Indian Restaurant,New American Restaurant,Wine Bar,Event Space,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant
6,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Vietnamese Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
7,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,2,Event Space,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Italian Restaurant
8,M5S,Downtown Toronto,"University of Toronto, Harbord",43.662696,-79.400049,1,Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
9,M5T,Downtown Toronto,"Kensington Market, Chinatown, Grange Park",43.653206,-79.400049,0,Chinese Restaurant,Breakfast Spot,Caribbean Restaurant,Noodle House,Nightclub,Korean Restaurant,Dim Sum Restaurant,Wine Bar,Event Space,Bar


In [23]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], downtown_merged['Neighbourhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now we have clusters which are separated by the diversity of restaurants present in each cluster. Now we have to analyse each cluster and derive the result according to our requirements. 

### Methodology

In this project, we are helping the customer to establish their business. according to them they want diverse restaurant types around their facility. So that they can touch more types of customers in their business of interest which is supplying and servicing Restaurant appliances and equipments. 

In the first step, we collected the data about the Postal Codes of Neighbourhood in Toronto city. We cleaned and merge the data to extract information of different restaurants in the particular neighbourhood. We then transformed into the dataframes which gives out the top ten restaurants in the particular neighbourhood. 

In the second step, we use K-Means clustering to divide the Postal Codes and Neighbourhood with similar characteristics into clusters. In this case, we divided them into five clusters. 

In final Step, we analyze through each cluster to attain best possibe location for new facility according to the requirements of the customer.

### Analysis

#### Cluster 1

In [24]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 0, downtown_merged.columns[[0]+[2] + list(range(5,downtown_merged.shape[1]))]]

Unnamed: 0,Postal Code,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5B,"Garden District, Ryerson",0,Breakfast Spot,Diner,Wine Bar,Indian Restaurant,Bar,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Event Space,Italian Restaurant
2,M5E,Berczy Park,0,American Restaurant,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space,Wine Bar
3,M5G,Central Bay Street,0,Chinese Restaurant,Italian Restaurant,Restaurant,Vietnamese Restaurant,American Restaurant,Theme Restaurant,Noodle House,Nightclub,New American Restaurant,Korean Restaurant
5,M5H,"Richmond, Adelaide, King",0,Restaurant,Indian Restaurant,New American Restaurant,Wine Bar,Event Space,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant
6,M4M,Studio District,0,Vietnamese Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
9,M5T,"Kensington Market, Chinatown, Grange Park",0,Chinese Restaurant,Breakfast Spot,Caribbean Restaurant,Noodle House,Nightclub,Korean Restaurant,Dim Sum Restaurant,Wine Bar,Event Space,Bar
10,M5V,"CN Tower, King and Spadina, Railway Lands, Har...",0,Wine Bar,Indian Restaurant,Bar,Restaurant,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
12,M4X,"St. James Town, Cabbagetown",0,Chinese Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Dim Sum Restaurant,Diner,Event Space,Italian Restaurant


#### Cluster 2

In [25]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 1, downtown_merged.columns[[0] + [2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Postal Code,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M5C,St. James Town,1,Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
8,M5S,"University of Toronto, Harbord",1,Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space
11,M4W,Rosedale,1,Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space


#### Cluster 3

In [26]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 2, downtown_merged.columns[[0]+[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Postal Code,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,M5R,"The Annex, North Midtown, Yorkville",2,Event Space,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Italian Restaurant


#### Cluster 4

In [27]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 3, downtown_merged.columns[[0]+[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Postal Code,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,M4Y,Church and Wellesley,3,Thai Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space


#### Cluster 5

In [28]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 4, downtown_merged.columns[[0]+[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Postal Code,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,M6G,Christie,4,Korean Restaurant,Wine Bar,Indian Restaurant,Bar,Breakfast Spot,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Diner,Event Space


### Conclusion

Every cluster presents the specific characteristics of its own. According to requirement of diverse restaurants type, Cluster 3 is very much suitable. Cluster 3 has many borough and to select the best one we selected the postal code with large number of neighbourhood. In this case, M5V is best option for setting up the new facility. 

In [30]:
downtown_merged.loc[10]

Postal Code                                                             M5V
Borough                                                    Downtown Toronto
Neighbourhood             CN Tower, King and Spadina, Railway Lands, Har...
Latitude                                                            43.6289
Longitude                                                          -79.3944
Cluster Labels                                                            0
1st Most Common Venue                                              Wine Bar
2nd Most Common Venue                                     Indian Restaurant
3rd Most Common Venue                                                   Bar
4th Most Common Venue                                            Restaurant
5th Most Common Venue                                        Breakfast Spot
6th Most Common Venue                                  Caribbean Restaurant
7th Most Common Venue                                    Chinese Restaurant
8th Most Com