## Coursera_IBM_Applied-Data-Science-Capstone
#### This Notebook represents my work for the Coursera_IBM_Applied Data Science Capstone as one of the various courses of IBM Data Science Professional Certificate

### Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto
##### (C) Ahmed Tealeb

### Part-1: Extracting the raw table (from Wikipedia webpage) and Save it to "CSV" File

##### 1- Start by creating a new Notebook for this assignment.

##### 2- Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe

#### 3. To create the above dataframe:

- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is <B>Not assigned</B>.
- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that <B>M5A</B> is listed twice and has two neighborhoods: <B>Harbourfront</B> and <B>Regent Park</B>. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in <B>row 11</B> in the above table.
- If a cell has a borough but a <B>Not assigned</B> neighborhood, then the neighborhood will be the same as the borough. So for the <B>9th</B> cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be <B>Queen's Park</B>.
- Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
- In the last cell of your notebook, use the <B>.shape</B> method to print the number of rows of your dataframe.

##### 4. Submit a link to your Notebook on your Github repository. <B>(10 marks)</B>

Note: There are different website scraping libraries and packages in Python. One of the most common packages is BeautifulSoup. Here is the package's main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/

The package is so popular that there is a plethora of tutorials and examples of how to use it. Here is a very good Youtube video on how to use the BeautifulSoup package: https://www.youtube.com/watch?v=ng2o98k983k

Use the BeautifulSoup package to transform the data in the table on the Wikipedia page into the above pandas dataframe

#### Step 1: Use BeautifulSoup; the most common package, to download the table data

In [3]:
# Importing Libraries

import pandas as pd
import numpy as np
import os, sys
import urllib
import requests 
from urllib.request import urlopen
from bs4 import BeautifulSoup

wikipedia_link = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
raw_wikipedia_page = urlopen(wikipedia_link)
soup = BeautifulSoup(raw_wikipedia_page) # BeautifulSoup to Parse the url page
raw_wikipedia_page.close()
 
fp = open("Toronto_FSAs_Raw.csv", "w")
tables = soup.findAll('table')
Toronto_FSAs_table = tables[0]
for tr in Toronto_FSAs_table.tbody.findAll('tr'):
    # print(tr.findAll('th'))
    for th in tr.findAll('th'):
        text = th.getText().strip() + ','
        fp.write(text)
    for td in tr.findAll('td'):
        text = td.getText().strip() + ','
        fp.write(text)
    fp.write('\n')
fp.close()

#### Step 2: Load the dataframe - The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

In [4]:
import pandas as pd
df = pd.read_csv('Toronto_FSAs_Raw.csv')
df.drop('Unnamed: 3', axis = 1, inplace = True)
df.rename(columns={'Postcode':'PostalCode'}, inplace=True)
df.head()
# dfs.shape

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


#### Step 3: Remove the rows which "Not assigned" existed in the "Borough" column

In [5]:
# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

df_Cleaned = df[ ~ df['Borough'].str.contains('Not assigned')]
df_Cleaned.shape

(212, 3)

#### Step 4: Combine the "Neighbourhood"'s values by grouping the "Postcode" and "Borough"

In [6]:
# More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, 
# you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. 
# These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

grouped = df_Cleaned.groupby(['PostalCode', 'Borough'], as_index = False).agg(', '.join)
# df_grouped = pd.DataFrame(grouped.sum())
df_grouped = pd.DataFrame(grouped)
df_grouped.head()
df_grouped.shape

(103, 3)

#### Step 5: Replace the "Not assigned" in 'Neighbourhood' column with 'Borough' column value.

In [7]:
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. 
# So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

for i in range(len(df_grouped)):
    
    line_data = df_grouped.iloc[i, :]
    if line_data['Neighbourhood'] == 'Not assigned':
        line_data['Neighbourhood'] = line_data['Borough']
    df_grouped.to_csv('TorontoFSAs.csv', index = False)
df_grouped.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


#### Step 6: Using the the .shape method to print the number of rows of your dataframe

In [8]:
# In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.
df_grouped.shape

(103, 3)

In [9]:
# Subset the dataframe based on a PostalCode List - Additional Step
PostalCodesList = ['M5G', 'M2H', 'M4B', 'M1J', 'M4G', 'M4M', 'M1R', 'M9V', 'M9L', 'M5V', 'M1B', 'M5A']
#df_grouped.PostalCode.isin(PostalCodesList)
df_group_Unordered = df_grouped[df_grouped['PostalCode'].isin(PostalCodesList)].reset_index(drop=True)
df_group_Unordered

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1J,Scarborough,Scarborough Village
2,M1R,Scarborough,"Maryvale, Wexford"
3,M2H,North York,Hillcrest Village
4,M4B,East York,"Woodbine Gardens, Parkview Hill"
5,M4G,East York,Leaside
6,M4M,East Toronto,Studio District
7,M5A,Downtown Toronto,"Harbourfront, Regent Park"
8,M5G,Downtown Toronto,Central Bay Street
9,M5V,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo..."


### Part-2: Get the latitude and the longitude coordinates of each neighborhood and Save it to "CSV" File

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking postal code <B>M5G</B> as an example, your code would look something like this:

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

Use the Geocoder package or the csv file to create the following dataframe:

Important Note: There is a limit on how many times you can call geocoder.google function. It is 2500 times per day. This should be way more than enough for you to get acquainted with the package and to use it to get the geographical coordinates of the neighborhoods in the Toronto.

#### Step 1: # Load the Geospatial_Coordinates.csv in a new dataframe called "geo-data"

In [10]:
geo_data = pd.read_csv("Geospatial_Coordinates.csv")
geo_data.rename(columns={'Postal Code':'PostalCode'}, inplace = True)
geo_data.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### Step 2: Merge both "df_grouped" and "geo_data" dataframes based on "PostalCode" column in another one

In [11]:
df_merge = pd.merge(df_grouped, geo_data, on = 'PostalCode')
df_merge.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


#### Step 3: Save the dataframe "df_merge" to csv File

In [12]:
df_merge.to_csv('TorontoFSAs_Coordinates.csv', index = False)

### Part-3: Segmenting and Clustering Neighborhoods in Toronto

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

1. to add enough Markdown cells to explain what you decided to do and to report any observations you make.
2. to generate maps to visualize your neighborhoods and how they cluster together.
Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. <B>(3 marks)</B>

#### Step 1: Use Google to find the Latitude and Longitude of Toronto, Canada

In [13]:
from geopy.geocoders import Nominatim
address = 'Toronto, CA'

# geolocator = Nominatim()
# location = geolocator.geocode(address)
# latitude = location.latitude
# longitude = location.longitude

latitude=43.653963
longitude=-79.387207
print('The Geograpical Coordinate of Toronto is {}, {}.'.format(latitude, longitude))

The Geograpical Coordinate of Toronto is 43.653963, -79.387207.


#### Step 2: Import the required libraries

In [14]:
import numpy as np # Library to handle data in a vectorized manner

import pandas as pd # Library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# Uncomment this line if you have not completed the Foursquare API lab
# !conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # Convert an address into Latitude and Longitude Values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Uncomment this line if you have not completed the Foursquare API lab
# !conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### Step 3: Create the map of Toronto using Latitude and Longitude values

In [15]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start = 10)

# Add markers to map
for lat, lng, borough, neighborhood in zip(df_merge['Latitude'], df_merge['Longitude'], df_merge['Borough'], df_merge['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7).add_to(map_Toronto)  
    
map_Toronto

#### Step 4: Selecting a specific "Borough"

In [16]:
Scarborough_data = df_merge[df_merge['Borough'] == 'Scarborough'].reset_index(drop = True)
Scarborough_data

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


In [17]:
# containsToronto_data = df_merge[df_merge['Borough'].str.contains("Toronto")].reset_index(drop = True)
# containsToronto_data

#### Step 5: Use Google to find the Latitude and Longitude of Scarborough, Canada

In [18]:
address = 'Scarborough, CA'

# geolocator = Nominatim()
# location = geolocator.geocode(address)
# latitude = location.latitude
# longitude = location.longitude

latitude = 43.773077
longitude = -79.257774
print('The Geograpical Coordinate of Scarborough is {}, {}.'.format(latitude, longitude))

The Geograpical Coordinate of Scarborough is 43.773077, -79.257774.


#### Step 6: Create the map of Scarborough using Latitude and Longitude values

In [19]:
map_Scarborough = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to map
for lat, lng, label in zip(Scarborough_data['Latitude'], Scarborough_data['Longitude'], Scarborough_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7).add_to(map_Scarborough)  
    
map_Scarborough

### Input my foursquare ID

#### Step 7: My Foursquare Project Credentials

In [20]:
CLIENT_ID = '1DSRC5HBGQXKI2PVYIUZY2EDC2N0QXKLJ32YSJVNLXJNXP12' # Enter your Foursquare Client ID
CLIENT_SECRET = 'M551JVRZT2OYTPWAGQQTFDZF4UBOHDQTRI4ACE1Y0T14SVIF' # Enter your Foursquare Client Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1DSRC5HBGQXKI2PVYIUZY2EDC2N0QXKLJ32YSJVNLXJNXP12
CLIENT_SECRET:M551JVRZT2OYTPWAGQQTFDZF4UBOHDQTRI4ACE1Y0T14SVIF


In [21]:
Scarborough_data.loc[0, 'Neighbourhood']

'Rouge, Malvern'

In [25]:
Scarborough_data.shape

(17, 5)

#### Step 8: Find the Latitude and longitude values of Rouge, Malvern

In [26]:
neighborhood_latitude = Scarborough_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Scarborough_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = Scarborough_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and Longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and Longitude values of Rouge, Malvern are 43.806686299999996, -79.19435340000001.


#### Step 9: Trace the URL to fetch the data from Foursquare

In [27]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=1DSRC5HBGQXKI2PVYIUZY2EDC2N0QXKLJ32YSJVNLXJNXP12&client_secret=M551JVRZT2OYTPWAGQQTFDZF4UBOHDQTRI4ACE1Y0T14SVIF&v=20180605&ll=43.806686299999996,-79.19435340000001&radius=500&limit=100'

### Get the Json from the URL

#### Step 10: Get the Json from the URL

In [28]:
s = requests.get(url)
results = s.json()
results

{'meta': {'code': 200, 'requestId': '5c3abb86db04f57dcf9a3c3f'},
  'headerLocation': 'Malvern',
  'headerFullLocation': 'Malvern, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 1,
  'suggestedBounds': {'ne': {'lat': 43.8111863045, 'lng': -79.18812958073042},
   'sw': {'lat': 43.80218629549999, 'lng': -79.2005772192696}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bb6b9446edc76b0d771311c',
       'name': "Wendy's",
       'location': {'crossStreet': 'Morningside & Sheppard',
        'lat': 43.80744841934756,
        'lng': -79.19905558052072,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.80744841934756,
          'lng': -79.19905558052072}],
        'distance': 387,
        'cc': 'CA',
        'city': 'Toronto',
    

### Function to get the Category's type

In [31]:
# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Check how many venues from the URL_Json

In [33]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wendy's,Fast Food Restaurant,43.807448,-79.199056


In [34]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

1 venues were returned by Foursquare.


### Function to find the nearby venues

In [36]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [37]:
Scarborough_venues = getNearbyVenues(names=Scarborough_data['Neighbourhood'],
                                   latitudes=Scarborough_data['Latitude'],
                                   longitudes=Scarborough_data['Longitude']
                                  )

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West, Steeles West
Upper Rouge


In [38]:
Scarborough_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
5,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
6,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location
7,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Woburn Medical Centre,43.766631,-79.192286,Medical Center
8,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Eggsmart,43.7678,-79.190466,Breakfast Spot
9,Woburn,43.770992,-79.216917,Starbucks,43.770037,-79.221156,Coffee Shop


### You can't find venues in "Upper Rouge", so we need to remove this row from Scarborough_data

In [39]:
Scarborough_data.tail()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
12,M1S,Scarborough,Agincourt,43.7942,-79.262029
13,M1T,Scarborough,"Clarks Corners, Sullivan, Tam O'Shanter",43.781638,-79.304302
14,M1V,Scarborough,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577
15,M1W,Scarborough,"L'Amoreaux West, Steeles West",43.799525,-79.318389
16,M1X,Scarborough,Upper Rouge,43.836125,-79.205636


In [40]:
Scarborough_data.drop(index = 16,axis = 0,inplace = True)
Scarborough_data.tail()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
11,M1R,Scarborough,"Maryvale, Wexford",43.750072,-79.295849
12,M1S,Scarborough,Agincourt,43.7942,-79.262029
13,M1T,Scarborough,"Clarks Corners, Sullivan, Tam O'Shanter",43.781638,-79.304302
14,M1V,Scarborough,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577
15,M1W,Scarborough,"L'Amoreaux West, Steeles West",43.799525,-79.318389


In [41]:
print(Scarborough_venues.shape)
Scarborough_venues.head()

(87, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


### Group by the Venues

In [42]:
Scarborough_venues.groupby('Neighborhood').head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
5,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
6,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location
7,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Woburn Medical Centre,43.766631,-79.192286,Medical Center
9,Woburn,43.770992,-79.216917,Starbucks,43.770037,-79.221156,Coffee Shop
10,Woburn,43.770992,-79.216917,Tim Hortons,43.770827,-79.223078,Coffee Shop


In [43]:
print('There are {} uniques categories.'.format(len(Scarborough_venues['Venue Category'].unique())))

There are 53 uniques categories.


### Make one hot to to the mechine learning

In [45]:
# one hot encoding
Scarborough_onehot = pd.get_dummies(Scarborough_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Scarborough_onehot['Neighborhood'] = Scarborough_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Scarborough_onehot.columns[-1]] + list(Scarborough_onehot.columns[:-1])
Scarborough_onehot = Scarborough_onehot[fixed_columns]

Scarborough_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Auto Garage,Bakery,Bank,Bar,Breakfast Spot,Bus Line,Bus Station,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,College Stadium,Construction & Landscaping,Cosmetics Shop,Department Store,Discount Store,Electronics Store,Fast Food Restaurant,Fried Chicken Joint,Furniture / Home Store,General Entertainment,Grocery Store,Hakka Restaurant,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Lounge,Medical Center,Metro Station,Mexican Restaurant,Motel,Moving Target,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Playground,Rental Car Location,Sandwich Place,Shopping Mall,Skating Rink,Smoke Shop,Soccer Field,Thai Restaurant,Thrift / Vintage Store,Train Station,Vietnamese Restaurant
0,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Highland Creek, Rouge Hill, Port Union",0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Highland Creek, Rouge Hill, Port Union",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [46]:
Scarborough_onehot.shape

(87, 54)

In [47]:
Scarborough_grouped = Scarborough_onehot.groupby('Neighborhood').mean().reset_index()
Scarborough_grouped

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Auto Garage,Bakery,Bank,Bar,Breakfast Spot,Bus Line,Bus Station,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,College Stadium,Construction & Landscaping,Cosmetics Shop,Department Store,Discount Store,Electronics Store,Fast Food Restaurant,Fried Chicken Joint,Furniture / Home Store,General Entertainment,Grocery Store,Hakka Restaurant,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Lounge,Medical Center,Metro Station,Mexican Restaurant,Motel,Moving Target,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Playground,Rental Car Location,Sandwich Place,Shopping Mall,Skating Rink,Smoke Shop,Soccer Field,Thai Restaurant,Thrift / Vintage Store,Train Station,Vietnamese Restaurant
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
1,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
3,Cedarbrae,0.0,0.125,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0
4,"Clairlea, Golden Mile, Oakridge",0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0
5,"Clarks Corners, Sullivan, Tam O'Shanter",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0
6,"Cliffcrest, Cliffside, Scarborough Village West",0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Dorset Park, Scarborough Town Centre, Wexford ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857
8,"East Birchmount Park, Ionview, Kennedy Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.166667,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0
9,"Guildwood, Morningside, West Hill",0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [48]:
Scarborough_grouped.shape

(16, 54)

### Review the Top 5 Venues

In [49]:
num_top_venues = 5

for hood in Scarborough_grouped['Neighborhood']:
    print("----"+ hood +"----")
    temp = Scarborough_grouped[Scarborough_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['Venue','Frequency']
    temp = temp.iloc[1:]
    temp['Frequency'] = temp['Frequency'].astype(float)
    temp = temp.round({'Frequency': 2})
    print(temp.sort_values('Frequency', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')

----Agincourt----
                 Venue  Frequency
0               Lounge       0.25
1       Breakfast Spot       0.25
2         Skating Rink       0.25
3       Sandwich Place       0.25
4  American Restaurant       0.00


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                 Venue  Frequency
0           Playground        0.5
1                 Park        0.5
2  American Restaurant        0.0
3            Pet Store        0.0
4    Korean Restaurant        0.0


----Birch Cliff, Cliffside West----
                   Venue  Frequency
0  General Entertainment       0.25
1           Skating Rink       0.25
2                   Café       0.25
3        College Stadium       0.25
4    American Restaurant       0.00


----Cedarbrae----
                Venue  Frequency
0              Lounge       0.12
1              Bakery       0.12
2                Bank       0.12
3  Athletics & Sports       0.12
4     Thai Restaurant       0.12


----Clairlea, Golden Mile, Oakrid

### Function to Sort the most Common Venues

In [50]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)    
    return row_categories_sorted.index.values[0:num_top_venues]

In [51]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Scarborough_grouped['Neighborhood']

for ind in np.arange(Scarborough_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Scarborough_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Skating Rink,Sandwich Place,Breakfast Spot,Lounge,Vietnamese Restaurant,College Stadium,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant
1,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Playground,Vietnamese Restaurant,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store,Department Store
2,"Birch Cliff, Cliffside West",College Stadium,General Entertainment,Skating Rink,Café,Vietnamese Restaurant,Grocery Store,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
3,Cedarbrae,Athletics & Sports,Thai Restaurant,Bakery,Bank,Fried Chicken Joint,Lounge,Caribbean Restaurant,Hakka Restaurant,Vietnamese Restaurant,Cosmetics Shop
4,"Clairlea, Golden Mile, Oakridge",Bus Line,Bakery,Intersection,Fast Food Restaurant,Metro Station,Bus Station,Park,Soccer Field,Bar,Cosmetics Shop
5,"Clarks Corners, Sullivan, Tam O'Shanter",Pizza Place,Chinese Restaurant,Noodle House,Thai Restaurant,Fried Chicken Joint,Italian Restaurant,Fast Food Restaurant,Pharmacy,College Stadium,Construction & Landscaping
6,"Cliffcrest, Cliffside, Scarborough Village West",Intersection,Motel,American Restaurant,Thai Restaurant,Coffee Shop,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
7,"Dorset Park, Scarborough Town Centre, Wexford ...",Indian Restaurant,Chinese Restaurant,Furniture / Home Store,Latin American Restaurant,Pet Store,Vietnamese Restaurant,Bar,Breakfast Spot,Grocery Store,General Entertainment
8,"East Birchmount Park, Ionview, Kennedy Park",Discount Store,Coffee Shop,Chinese Restaurant,Department Store,Train Station,Bank,Bar,Hakka Restaurant,Grocery Store,General Entertainment
9,"Guildwood, Morningside, West Hill",Electronics Store,Breakfast Spot,Rental Car Location,Medical Center,Pizza Place,Mexican Restaurant,Vietnamese Restaurant,College Stadium,Furniture / Home Store,Fried Chicken Joint


### Use K-mean to do the machine learning

In [52]:
# Set number of clusters
kclusters = 3

Scarborough_grouped_clustering = Scarborough_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(Scarborough_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

### Make labels

In [56]:
Scarborough_merged = Scarborough_data

# add clustering labels
Scarborough_merged['Cluster Labels'] = kmeans.labels_
# make the column name the same
Scarborough_merged.rename(columns = {'Neighbourhood':'Neighborhood'}, inplace = True)

### Merge neighborhoods_venues_sorted to Scarborough_merged by column "Neighborhood"

In [57]:
Scarborough_merged = Scarborough_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on = 'Neighborhood')

Scarborough_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,0,Fast Food Restaurant,Vietnamese Restaurant,Indian Restaurant,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Electronics Store,Discount Store,Department Store
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,1,Bar,Moving Target,Vietnamese Restaurant,College Stadium,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0,Electronics Store,Breakfast Spot,Rental Car Location,Medical Center,Pizza Place,Mexican Restaurant,Vietnamese Restaurant,College Stadium,Furniture / Home Store,Fried Chicken Joint
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0,Coffee Shop,Korean Restaurant,Pharmacy,Vietnamese Restaurant,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0,Athletics & Sports,Thai Restaurant,Bakery,Bank,Fried Chicken Joint,Lounge,Caribbean Restaurant,Hakka Restaurant,Vietnamese Restaurant,Cosmetics Shop


### Create the Map

In [58]:
# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Scarborough_merged['Latitude'], Scarborough_merged['Longitude'], Scarborough_merged['Neighborhood'], 
                                  Scarborough_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster-1],
        fill = True,
        fill_color = rainbow[cluster-1],
        fill_opacity = 0.7).add_to(map_clusters)
       
map_clusters

### Verify Cluster 1

In [59]:
Scarborough_merged.loc[Scarborough_merged['Cluster Labels'] == 0, Scarborough_merged.columns[[1] + list(range(3, Scarborough_merged.shape[1]))]]

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,43.806686,-79.194353,0,Fast Food Restaurant,Vietnamese Restaurant,Indian Restaurant,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Electronics Store,Discount Store,Department Store
2,Scarborough,43.763573,-79.188711,0,Electronics Store,Breakfast Spot,Rental Car Location,Medical Center,Pizza Place,Mexican Restaurant,Vietnamese Restaurant,College Stadium,Furniture / Home Store,Fried Chicken Joint
3,Scarborough,43.770992,-79.216917,0,Coffee Shop,Korean Restaurant,Pharmacy,Vietnamese Restaurant,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store
4,Scarborough,43.773136,-79.239476,0,Athletics & Sports,Thai Restaurant,Bakery,Bank,Fried Chicken Joint,Lounge,Caribbean Restaurant,Hakka Restaurant,Vietnamese Restaurant,Cosmetics Shop
5,Scarborough,43.744734,-79.239476,0,Playground,Construction & Landscaping,Vietnamese Restaurant,Coffee Shop,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
6,Scarborough,43.727929,-79.262029,0,Discount Store,Coffee Shop,Chinese Restaurant,Department Store,Train Station,Bank,Bar,Hakka Restaurant,Grocery Store,General Entertainment
7,Scarborough,43.711112,-79.284577,0,Bus Line,Bakery,Intersection,Fast Food Restaurant,Metro Station,Bus Station,Park,Soccer Field,Bar,Cosmetics Shop
8,Scarborough,43.716316,-79.239476,0,Intersection,Motel,American Restaurant,Thai Restaurant,Coffee Shop,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
9,Scarborough,43.692657,-79.264848,0,College Stadium,General Entertainment,Skating Rink,Café,Vietnamese Restaurant,Grocery Store,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
10,Scarborough,43.75741,-79.273304,0,Indian Restaurant,Chinese Restaurant,Furniture / Home Store,Latin American Restaurant,Pet Store,Vietnamese Restaurant,Bar,Breakfast Spot,Grocery Store,General Entertainment


### Verify Cluster 2

In [61]:
Scarborough_merged.loc[Scarborough_merged['Cluster Labels'] == 1, Scarborough_merged.columns[[1] + list(range(3, Scarborough_merged.shape[1]))]]

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,43.784535,-79.160497,1,Bar,Moving Target,Vietnamese Restaurant,College Stadium,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
14,Scarborough,43.815252,-79.284577,1,Park,Playground,Vietnamese Restaurant,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store,Department Store


### Verify Cluster 3

In [62]:
Scarborough_merged.loc[Scarborough_merged['Cluster Labels'] == 2, Scarborough_merged.columns[[1] + list(range(3, Scarborough_merged.shape[1]))]]

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Scarborough,43.781638,-79.304302,2,Pizza Place,Chinese Restaurant,Noodle House,Thai Restaurant,Fried Chicken Joint,Italian Restaurant,Fast Food Restaurant,Pharmacy,College Stadium,Construction & Landscaping


### Thank you for completing this notebook!

This notebook was created by [Ahmed Tealeb](https://www.linkedin.com/in/ahmedtealeb/).

This notebook is part of an assignment on **Coursera** called *Applied Data Science Capstone*. 

<hr>
Copyright &copy; 2019