# Capstone Project - The Battle of Neighborhood (Week2)

## Contents
* [Introduction & Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results & Discussions](#res)
* [Conclusion](#concl)

## Introduction & Business Problem<a name="introduction"></a>

**Introduction:** Scarborough is an administrative district and former city in Toronto, Ontario, Canada. Scarborough is a popular destination for new immigrants in Canada to reside. As a result, it is one of the most diverse and multicultural areas in the Greater Toronto Area, being home to various religious groups and places of worship. It includes some of Toronto's popular natural landmarks, such as the Toronto Zoo and Rouge Park. The northeast corner of Scarborough is largely rural with some of Toronto’s last remaining farms, leading to Scarborough’s reputation of being greener than any other part of Toronto.

**Business Problem:** The objective of the project is to guide or provide insights to anyone who wants to immigrate to Canada, but has chosen Scarborough to dwell and start its own business. Project will try to segment and cluster areas of Scarborough on the most common places captured from Foursquare. Knowing the neighborhoods/locality and type of business thriving in those venues can aid to narrow down/finalize the business plans.

## Data<a name="data"></a>

The Postal code data is acquired from below wikipedia page
* https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Latitude/Longitude coordinates are downloaded as csv from below link:
* http://cocl.us/Geospatial_data

Foursquare API will be used to get the data to explore the neighborhood.

In [57]:
# Import libraries

import requests
import pandas as pd
import numpy as np

from geopy.geocoders import Nominatim
import folium

In [58]:
# Used wiki link to get the postal codes

wikipedia_link='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
raw_random_wikipedia_page=requests.get(wikipedia_link)
page=raw_random_wikipedia_page.text

In [31]:
# Used BeautifulSoup to parse the HTML

from bs4 import BeautifulSoup
soup = BeautifulSoup(page, 'html.parser')

In [32]:
# Pick up text in 'td' tags

x=[t.text for t in soup.find_all('td')]

In [33]:
# Remove data not required

rng=x.index("")
y=x[0:rng]

In [34]:
# Define the dataframe columns
column_names = ['PostalCode', 'Borough', 'Neighborhood'] 

# Instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [36]:
# check the header

neighborhoods

Unnamed: 0,PostalCode,Borough,Neighborhood


In [37]:
# Insert data into the dataframe using for loop

for i in range(0,rng):
    if i%3==0:
        postal_code=y[i]
        borough=y[i+1]
        nbd=y[i+2]
#        print("postal_code:",postal_code)
#        print("borough:",borough)
#        print("neighborhood:",nbd)
        neighborhoods = neighborhoods.append({'PostalCode': postal_code,
                                              'Borough':  borough,
                                              'Neighborhood': nbd}, ignore_index=True)    

In [41]:
# remove new line characters

neighborhoods=neighborhoods.replace('\n','',regex=True)

In [42]:
# remove rows where Borough is "Not Assigned"

neighborhoods=neighborhoods[neighborhoods.Borough != 'Not assigned']

In [43]:
# group the neighborhoods

nb=neighborhoods.groupby(['PostalCode','Borough'])['Neighborhood'].apply(lambda t: ','.join(t)).to_frame().reset_index()

In [44]:
# random check

nb.loc[nb['PostalCode'] == 'M5G']

Unnamed: 0,PostalCode,Borough,Neighborhood
57,M5G,Downtown Toronto,Central Bay Street


In [45]:
# assign values to Neighborhood that doesn't have any value

nb.loc[nb.Neighborhood == 'Not assigned', 'Neighborhood'] = nb.Borough

In [46]:
nb.shape
print('There are {} rows in the dataframe' .format(nb.shape[0]))

There are 103 rows in the dataframe


In [47]:
nb.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [48]:
# read url to get the coordinates

file=pd.read_csv('http://cocl.us/Geospatial_data')

In [50]:
file.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [51]:
# Merge neighborhood and coordinates

mrg=pd.concat([nb,file],axis=1)

In [52]:
mrg.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",M1B,43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",M1C,43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",M1E,43.763573,-79.188711
3,M1G,Scarborough,Woburn,M1G,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,M1H,43.773136,-79.239476


In [53]:
# drop extra postal code column
neighborhoods=mrg.drop(['Postal Code'],axis=1)
neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [54]:
# Filter data to create Scarborough dataframe

scarb_data = neighborhoods[neighborhoods['Borough'] == ('Scarborough')].reset_index(drop=True)
scarb_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848


##### <b>Use geopy library to get the latitude and longitude values of Scarborough</b>

In [59]:
address = 'Scarborough, ON'

geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Scarborough are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Scarborough are 43.773077, -79.257774.


In [90]:
# create map of Scarborough using latitude and longitude values
map_scarb = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(scarb_data['Latitude'], scarb_data['Longitude'], scarb_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_scarb)  
    
map_scarb

Next, will be utilizing the Foursquare API to explore the neighborhoods and segment them.

In [61]:
CLIENT_ID = 'B3RG0FZMFK04VGBWNNBFSPU40XKJUN0GJL0ZA1MJY3ONRLYM' # your Foursquare ID
CLIENT_SECRET = 'Y1VLGS4MGY5MCDNJ3NXYGBKTNY3CSTLFOMTEWJCJO5FXCDCE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT=100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: B3RG0FZMFK04VGBWNNBFSPU40XKJUN0GJL0ZA1MJY3ONRLYM
CLIENT_SECRET:Y1VLGS4MGY5MCDNJ3NXYGBKTNY3CSTLFOMTEWJCJO5FXCDCE


#### Create a function to repeat the same process to all the neighborhoods in Scarborough

In [62]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Below code is to run the above function on each neighborhood and create a new dataframe called *scarb_venues*.

In [63]:
scarb_venues = getNearbyVenues(names=scarb_data['Neighborhood'],
                                   latitudes=scarb_data['Latitude'],
                                   longitudes=scarb_data['Longitude']
                                  )

Rouge,Malvern
Highland Creek,Rouge Hill,Port Union
Guildwood,Morningside,West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park,Ionview,Kennedy Park
Clairlea,Golden Mile,Oakridge
Cliffcrest,Cliffside,Scarborough Village West
Birch Cliff,Cliffside West
Dorset Park,Scarborough Town Centre,Wexford Heights
Maryvale,Wexford
Agincourt
Clarks Corners,Sullivan,Tam O'Shanter
Agincourt North,L'Amoreaux East,Milliken,Steeles East
L'Amoreaux West,Steeles West
Upper Rouge


In [64]:
scarb_venues.shape

(88, 7)

In [65]:
scarb_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge,Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target
3,"Guildwood,Morningside,West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Guildwood,Morningside,West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


Let's check how many venues were returned for each neighborhood

In [66]:
scarb_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",3,3,3,3,3,3
"Birch Cliff,Cliffside West",4,4,4,4,4,4
Cedarbrae,7,7,7,7,7,7
"Clairlea,Golden Mile,Oakridge",10,10,10,10,10,10
"Clarks Corners,Sullivan,Tam O'Shanter",9,9,9,9,9,9
"Cliffcrest,Cliffside,Scarborough Village West",3,3,3,3,3,3
"Dorset Park,Scarborough Town Centre,Wexford Heights",7,7,7,7,7,7
"East Birchmount Park,Ionview,Kennedy Park",7,7,7,7,7,7
"Guildwood,Morningside,West Hill",6,6,6,6,6,6


#### Let's find out how many unique categories can be curated from all the returned venues

In [67]:
print('There are {} uniques categories.'.format(len(scarb_venues['Venue Category'].unique())))

There are 54 uniques categories.


## Methodology<a name="methodology"></a>

In this project efforts are to detect areas of Scarborough Venues and their categories. Idea is **NOT** to target any specific category but to provide whole spectrum of current businesses in the region.

In first step we have collected the required **data: location and type (category) of every venue**

Second step in analysis was calculation and exploration of '**venues count**' across different neighborhood of Scarborough

Third step will be to create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses where businesses are currently dominating

#### Analyzing Each Neighborhood

In [68]:
# one hot encoding
scarb_onehot = pd.get_dummies(scarb_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
scarb_onehot['Neighborhood'] = scarb_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [scarb_onehot.columns[-1]] + list(scarb_onehot.columns[:-1])
scarb_onehot = scarb_onehot[fixed_columns]

scarb_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Athletics & Sports,Auto Garage,Bakery,Bank,Bar,Breakfast Spot,Burger Joint,...,Playground,Rental Car Location,Sandwich Place,Shopping Mall,Skating Rink,Soccer Field,Spa,Thai Restaurant,Train Station,Vietnamese Restaurant
0,"Rouge,Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Highland Creek,Rouge Hill,Port Union",0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Highland Creek,Rouge Hill,Port Union",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Guildwood,Morningside,West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood,Morningside,West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [69]:
scarb_onehot.shape

(88, 55)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [70]:
scarb_grouped = scarb_onehot.groupby('Neighborhood').mean().reset_index()
scarb_grouped.head

<bound method NDFrame.head of                                          Neighborhood  Accessories Store  \
0                                           Agincourt           0.000000   
1   Agincourt North,L'Amoreaux East,Milliken,Steel...           0.000000   
2                          Birch Cliff,Cliffside West           0.000000   
3                                           Cedarbrae           0.000000   
4                       Clairlea,Golden Mile,Oakridge           0.000000   
5               Clarks Corners,Sullivan,Tam O'Shanter           0.000000   
6       Cliffcrest,Cliffside,Scarborough Village West           0.000000   
7   Dorset Park,Scarborough Town Centre,Wexford He...           0.000000   
8           East Birchmount Park,Ionview,Kennedy Park           0.000000   
9                     Guildwood,Morningside,West Hill           0.000000   
10               Highland Creek,Rouge Hill,Port Union           0.000000   
11                       L'Amoreaux West,Steeles West     

#### Let's print each neighborhood along with the top 5 most common venues

In [71]:
num_top_venues = 5

for hood in scarb_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = scarb_grouped[scarb_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
            venue  freq
0  Clothing Store  0.25
1          Lounge  0.25
2    Skating Rink  0.25
3  Breakfast Spot  0.25
4       Pet Store  0.00


----Agincourt North,L'Amoreaux East,Milliken,Steeles East----
                 venue  freq
0                 Park  0.67
1           Playground  0.33
2    Accessories Store  0.00
3   Italian Restaurant  0.00
4  Japanese Restaurant  0.00


----Birch Cliff,Cliffside West----
                   venue  freq
0  General Entertainment  0.25
1           Skating Rink  0.25
2                   Café  0.25
3        College Stadium  0.25
4      Accessories Store  0.00


----Cedarbrae----
                 venue  freq
0   Athletics & Sports  0.14
1               Bakery  0.14
2                 Bank  0.14
3      Thai Restaurant  0.14
4  Fried Chicken Joint  0.14


----Clairlea,Golden Mile,Oakridge----
          venue  freq
0        Bakery   0.2
1      Bus Line   0.2
2  Intersection   0.1
3          Park   0.1
4  Soccer Field   0.1


----Clark

#### Function to sort the venues in descending order.

In [72]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 5 venues for each neighborhood.

In [73]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = scarb_grouped['Neighborhood']

for ind in np.arange(scarb_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scarb_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt,Skating Rink,Breakfast Spot,Lounge,Clothing Store,Vietnamese Restaurant
1,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Park,Playground,Vietnamese Restaurant,Caribbean Restaurant,General Entertainment
2,"Birch Cliff,Cliffside West",Café,General Entertainment,Skating Rink,College Stadium,Caribbean Restaurant
3,Cedarbrae,Thai Restaurant,Athletics & Sports,Bakery,Bank,Fried Chicken Joint
4,"Clairlea,Golden Mile,Oakridge",Bus Line,Bakery,Metro Station,Soccer Field,Intersection


### Run *k*-means to cluster the neighborhood into 5 clusters.

In [75]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

scarb_grouped_clustering = scarb_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(scarb_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 2, 4, 1], dtype=int32)

Created a new dataframe that includes the cluster as well as the top 5 venues for each neighborhood.

In [76]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

scarb_merged = scarb_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
scarb_merged = scarb_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

scarb_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353,2.0,Fast Food Restaurant,Vietnamese Restaurant,Chinese Restaurant,Grocery Store,General Entertainment
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,0.0,Bar,Moving Target,Vietnamese Restaurant,Chinese Restaurant,General Entertainment
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711,1.0,Electronics Store,Rental Car Location,Breakfast Spot,Pizza Place,Medical Center
3,M1G,Scarborough,Woburn,43.770992,-79.216917,1.0,Coffee Shop,Insurance Office,Korean Restaurant,Vietnamese Restaurant,Hakka Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,1.0,Thai Restaurant,Athletics & Sports,Bakery,Bank,Fried Chicken Joint


In [77]:
scarb_merged.fillna

<bound method DataFrame.fillna of    PostalCode      Borough                                       Neighborhood  \
0         M1B  Scarborough                                      Rouge,Malvern   
1         M1C  Scarborough               Highland Creek,Rouge Hill,Port Union   
2         M1E  Scarborough                    Guildwood,Morningside,West Hill   
3         M1G  Scarborough                                             Woburn   
4         M1H  Scarborough                                          Cedarbrae   
5         M1J  Scarborough                                Scarborough Village   
6         M1K  Scarborough          East Birchmount Park,Ionview,Kennedy Park   
7         M1L  Scarborough                      Clairlea,Golden Mile,Oakridge   
8         M1M  Scarborough      Cliffcrest,Cliffside,Scarborough Village West   
9         M1N  Scarborough                         Birch Cliff,Cliffside West   
10        M1P  Scarborough  Dorset Park,Scarborough Town Centre,Wexford He.

In [78]:
# replace Nan with 0
scarb_merged.fillna(0,inplace=True)

In [80]:
scarb_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353,2.0,Fast Food Restaurant,Vietnamese Restaurant,Chinese Restaurant,Grocery Store,General Entertainment
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,0.0,Bar,Moving Target,Vietnamese Restaurant,Chinese Restaurant,General Entertainment
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711,1.0,Electronics Store,Rental Car Location,Breakfast Spot,Pizza Place,Medical Center
3,M1G,Scarborough,Woburn,43.770992,-79.216917,1.0,Coffee Shop,Insurance Office,Korean Restaurant,Vietnamese Restaurant,Hakka Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,1.0,Thai Restaurant,Athletics & Sports,Bakery,Bank,Fried Chicken Joint
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,4.0,Spa,Playground,Vietnamese Restaurant,Caribbean Restaurant,General Entertainment
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029,1.0,Discount Store,Department Store,Bus Station,Coffee Shop,Train Station
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577,1.0,Bus Line,Bakery,Metro Station,Soccer Field,Intersection
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476,1.0,American Restaurant,Skating Rink,Motel,Vietnamese Restaurant,Chinese Restaurant
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848,1.0,Café,General Entertainment,Skating Rink,College Stadium,Caribbean Restaurant


In [81]:
#change cluster label type
scarb_merged['Cluster Labels'] = scarb_merged['Cluster Labels'].astype(np.int64)
scarb_merged.dtypes
#scarb_merged['Cluster Labels']=scarb_merged['Cluster Labels'].astype(int)

PostalCode                object
Borough                   object
Neighborhood              object
Latitude                 float64
Longitude                float64
Cluster Labels             int64
1st Most Common Venue     object
2nd Most Common Venue     object
3rd Most Common Venue     object
4th Most Common Venue     object
5th Most Common Venue     object
dtype: object

Finally, let's visualize the resulting clusters

In [82]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scarb_merged['Latitude'], scarb_merged['Longitude'], scarb_merged['Neighborhood'], scarb_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Cluster 1

In [83]:
scarb_merged.loc[scarb_merged['Cluster Labels'] == 0, scarb_merged.columns[[1] + list(range(5, scarb_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Scarborough,0,Bar,Moving Target,Vietnamese Restaurant,Chinese Restaurant,General Entertainment
16,Scarborough,0,0,0,0,0,0


#### Cluster 2

In [84]:
scarb_merged.loc[scarb_merged['Cluster Labels'] == 1, scarb_merged.columns[[1] + list(range(5, scarb_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Scarborough,1,Electronics Store,Rental Car Location,Breakfast Spot,Pizza Place,Medical Center
3,Scarborough,1,Coffee Shop,Insurance Office,Korean Restaurant,Vietnamese Restaurant,Hakka Restaurant
4,Scarborough,1,Thai Restaurant,Athletics & Sports,Bakery,Bank,Fried Chicken Joint
6,Scarborough,1,Discount Store,Department Store,Bus Station,Coffee Shop,Train Station
7,Scarborough,1,Bus Line,Bakery,Metro Station,Soccer Field,Intersection
8,Scarborough,1,American Restaurant,Skating Rink,Motel,Vietnamese Restaurant,Chinese Restaurant
9,Scarborough,1,Café,General Entertainment,Skating Rink,College Stadium,Caribbean Restaurant
10,Scarborough,1,Indian Restaurant,Pet Store,Chinese Restaurant,Furniture / Home Store,Latin American Restaurant
11,Scarborough,1,Accessories Store,Auto Garage,Bakery,Shopping Mall,Sandwich Place
12,Scarborough,1,Skating Rink,Breakfast Spot,Lounge,Clothing Store,Vietnamese Restaurant


#### Cluster 3

In [85]:
scarb_merged.loc[scarb_merged['Cluster Labels'] == 2, scarb_merged.columns[[1] + list(range(5, scarb_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Scarborough,2,Fast Food Restaurant,Vietnamese Restaurant,Chinese Restaurant,Grocery Store,General Entertainment


#### Cluster 4

In [86]:
scarb_merged.loc[scarb_merged['Cluster Labels'] == 3, scarb_merged.columns[[1] + list(range(5, scarb_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
14,Scarborough,3,Park,Playground,Vietnamese Restaurant,Caribbean Restaurant,General Entertainment


#### Cluster 5

In [87]:
scarb_merged.loc[scarb_merged['Cluster Labels'] == 4, scarb_merged.columns[[1] + list(range(5, scarb_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Scarborough,4,Spa,Playground,Vietnamese Restaurant,Caribbean Restaurant,General Entertainment


## Results<a name="res"></a>

We have the details of most common venues of each neighborhood in Scarborough. Also results shows that most business revolves around cluster1.

## Conclusion<a name="concl"></a>

Purpose of this project was to identify the neighborhoods/locality and types of business thriving in those venues so that any new immigrant with business in mind can aid to narrow down/finalize the business plans.

We see that cluster1 is the epicenter of most business. The stakeholder can utilize this information to jump into the bandwagon of cluster1. Or can utilize the fact that because there are not enough restaurants in other clusters, can pick one of them and proceed further. As an example further classification can be done based on the ethnicity density and population density of the region and and any cluster where the particular ethnic present is larger but not enough restaurants to cater the taste buds, respective category of restaurant should be probably a righteous way to jump start. Final decision is left to the user of this information on how they want it to interpret.