# Capstone Project Notebook
This notebook will be mainly used for the capstone project.

### WEEK 3 - part 1
1. Create a new Notebook for this assignment.
2. Create the dataframe:
* The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
* Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
* More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These  two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11  in the above table.
* If a cell has a borough but a Not assigned  neighborhood, then the neighborhood will be the same as the borough.
 Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
* In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [1]:
import pandas as pd
import numpy as npa
import html5lib 
data=pd.read_html('https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1011037969',skiprows=0)[0]
df=data
df.shape
df.rename(columns = {'Postal Code':'PostalCode'}, inplace = True)
df.head(20)

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [2]:
#Ignore cells with a borough that is Not assigned.
df.drop(df.index[df['Borough'] == 'Not assigned'], inplace = True)

df.reset_index(drop=True)
df.head(20)

Unnamed: 0,PostalCode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


### WEEK 3 - part 2

In [3]:
#Group the dataframe by borough using the code, to get the postcode for each neighbourhood
df_postcode = df.groupby(['PostalCode', 'Borough'])['Neighbourhood'].apply(','.join).reset_index()
df_postcode.head(20)


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


The csv file  link (https://cocl.us/Geospatial_data) has the latitude and longitude corresponding to each postalcode.
Built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, 
in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. 

In [4]:
# Join both the dataframes based on postal code, to get the latitude and longitude with respect to neighbourhood and not boroughs.

locgeo_df = pd.read_csv('https://cocl.us/Geospatial_data', index_col='Postal Code')
toronto_data = df_postcode.join(locgeo_df, on='PostalCode')
toronto_data.head(20)

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


### WEEK 3 - part 3
Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 

Just make sure:

* to add enough Markdown cells to explain what you decided to do and to report any observations you make. 
* to generate maps to visualize your neighborhoods and how they cluster together. 

Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. (3 marks)

Start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [5]:
#download all the dependencies that we will need.
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files


!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: / 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/linux-64::pytorch==1.8.0=cpu_py37hafa7651_0
  - defaults/noarch::ibm-wsrt-py37main-keep==0.0.0=2020
  - defaults/noarch::ibm-wsrt-py37main-main==custom=2020
done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: - 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/linux-64::pytorch==1.8.0=cpu_py37hafa7651_0
  - defaults/noarch::ibm-wsrt-py37main-keep==0.0.0=2020
  - defaults/noarch::ibm-wsrt-py37main-main==custom=2020
done

## Package Plan ##

  environment location

# 1. Explore Neighborhoods in Toronto and Etobicoke

#Let's get the geographical coordinates of Toronto.
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))


In [6]:
# Let's print Toronto map

!pip install folium
import folium
m = folium.Map(location=[43.6534817, -79.3839347])
folium.Map(location=[43.6534817, -79.3839347], zoom_start=13)

#let's visualize Toronto and the neighborhoods in it

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[43.6534817, -79.3839347], zoom_start=13)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto



## 2. Explore Neighborhoods in Etobicoke

In [7]:
#Let's explore Etobicoke borough neighbourhoods
df = toronto_data
etobicoke_data = df[df['Borough'] == 'Etobicoke'].reset_index(drop=True)
etobicoke_data.head(20)

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M8V,Etobicoke,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321
1,M8W,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484
2,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
3,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509
4,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M9B,Etobicoke,"West Deane Park, Princess Gardens, Martin Grov...",43.650943,-79.554724
7,M9C,Etobicoke,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",43.643515,-79.577201
8,M9P,Etobicoke,Westmount,43.696319,-79.532242
9,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724


In [8]:
#Let's get the geographical coordinates of Etobicoke.
address = 'Etobicoke, ON'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Etobicoke are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Etobicoke are 43.6435559, -79.5656326.


In [9]:
#let's visualize Etobicoke and the neighborhoods in it

# create map of Toronto using latitude and longitude values
map_etobicoke = folium.Map(location=[43.6435559, -79.5656326], zoom_start=13)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_etobicoke

In [17]:
#Define Foursquare Credentials and Version

CLIENT_ID = 'T3ORTMFXBQ5HBJNGCDR0FGZPEM4AVHC0C5QDOLU5JKEJFDGO' # your Foursquare ID
CLIENT_SECRET = '0DMQI4D2UK01IJKWXG1GC4YUHUCS1MCRE15I4D2MAMYPSOND' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: T3ORTMFXBQ5HBJNGCDR0FGZPEM4AVHC0C5QDOLU5JKEJFDGO
CLIENT_SECRET:0DMQI4D2UK01IJKWXG1GC4YUHUCS1MCRE15I4D2MAMYPSOND


In [18]:
#First, let's create the GET request URL. Name your URL **url**
# Etobicoke coordinates are 43.6435559, -79.5656326.

#The correct answer is:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
neighborhood_latitude=43.6435559
neighborhood_longitude=-79.5656326


# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=T3ORTMFXBQ5HBJNGCDR0FGZPEM4AVHC0C5QDOLU5JKEJFDGO&client_secret=0DMQI4D2UK01IJKWXG1GC4YUHUCS1MCRE15I4D2MAMYPSOND&v=20180605&ll=43.6435559,-79.5656326&radius=500&limit=100'

In [19]:
import requests

url = 'https://api.foursquare.com/v2/venues/explore?&client_id=T3ORTMFXBQ5HBJNGCDR0FGZPEM4AVHC0C5QDOLU5JKEJFDGO&client_secret=0DMQI4D2UK01IJKWXG1GC4YUHUCS1MCRE15I4D2MAMYPSOND&v=20180605&ll=43.6435559,-79.5656326&radius=500&limit=100'
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6063ddf29a07886de2f33bde'},
 'response': {'headerLocation': 'Etobicoke West Mall',
  'headerFullLocation': 'Etobicoke West Mall, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 11,
  'suggestedBounds': {'ne': {'lat': 43.648055904500005,
    'lng': -79.55942570638175},
   'sw': {'lat': 43.6390558955, 'lng': -79.57183949361826}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c3876b80a71c9b6aca240c9',
       'name': "Farmer's Market Etobicoke",
       'location': {'crossStreet': 'Burnhamthrope & east mall',
        'lat': 43.64306101960027,
        'lng': -79.56619097935341,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.64306101960027,
          'lng': -79.56619097935341}],
        'dis

In [None]:
#All the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [20]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [23]:
#Now we are ready to clean the json and structure it into a pandas dataframe.
from pandas.io.json import json_normalize


venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(20)



Unnamed: 0,name,categories,lat,lng
0,Farmer's Market Etobicoke,Farmers Market,43.643061,-79.566191
1,Tim Hortons,Coffee Shop,43.644742,-79.56768
2,Loblaws,Grocery Store,43.643848,-79.560113
3,State & Main Kitchen & Bar,Restaurant,43.645778,-79.560374
4,Burnhamthorpe and The West Mall,Intersection,43.644786,-79.567065
5,West Mall Rink,Skating Rink,43.642138,-79.566218
6,W.E. Kitchen,Breakfast Spot,43.64507,-79.567022
7,Joe Fresh,Clothing Store,43.643911,-79.560126
8,Rabba,Convenience Store,43.647096,-79.563026
9,Four Seasons Place,Hotel,43.647128,-79.563009


In [22]:
#And how many venues were returned by Foursquare?
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

11 venues were returned by Foursquare.


#Let's create a function to repeat the same process to all the neighborhoods in Etobicoke

In [None]:


def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [27]:
#create a new dataframe called etobicoke_venues
etobicoke_venues = getNearbyVenues(names=etobicoke_data['Neighbourhood'],
                                   latitudes=etobicoke_data['Latitude'],
                                   longitudes=etobicoke_data['Longitude']
                                  )

New Toronto, Mimico South, Humber Bay Shores
Alderwood, Long Branch
The Kingsway, Montgomery Road, Old Mill North
Old Mill South, King's Mill Park, Sunnylea, Humber Bay, Mimico NE, The Queensway East, Royal York South East, Kingsway Park South East
Mimico NW, The Queensway West, South of Bloor, Kingsway Park South West, Royal York South West
Islington Avenue, Humber Valley Village
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Westmount
Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens
South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens
Northwest, West Humber - Clairville


In [29]:
print(etobicoke_venues.shape)
etobicoke_venues.head(75)

(75, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,LCBO,43.602281,-79.499302,Liquor Store
1,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,Domino's Pizza,43.601583,-79.500905,Pizza Place
2,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,New Toronto Fish & Chips,43.601849,-79.503281,Restaurant
3,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,Delicia Bakery & Pastry,43.601403,-79.503012,Bakery
4,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,Lucky Dice Restaurant,43.601392,-79.503056,Café
5,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,McDonald's,43.602464,-79.498859,Fast Food Restaurant
6,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,Shoppers Drug Mart,43.601677,-79.502239,Pharmacy
7,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,Maple Leaf House,43.60204,-79.498678,American Restaurant
8,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,Coffee Time,43.602284,-79.499857,Coffee Shop
9,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,Pet Valu,43.602431,-79.498653,Pet Store


In [30]:
#Let's check how many venues were returned for each neighborhood
etobicoke_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Alderwood, Long Branch",9,9,9,9,9,9
"Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood",9,9,9,9,9,9
"Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens",3,3,3,3,3,3
"Mimico NW, The Queensway West, South of Bloor, Kingsway Park South West, Royal York South West",13,13,13,13,13,13
"New Toronto, Mimico South, Humber Bay Shores",12,12,12,12,12,12
"Northwest, West Humber - Clairville",5,5,5,5,5,5
"Old Mill South, King's Mill Park, Sunnylea, Humber Bay, Mimico NE, The Queensway East, Royal York South East, Kingsway Park South East",1,1,1,1,1,1
"South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens",10,10,10,10,10,10
"The Kingsway, Montgomery Road, Old Mill North",3,3,3,3,3,3
"West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale",2,2,2,2,2,2


In [31]:
#Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(etobicoke_venues['Venue Category'].unique())))

There are 42 uniques categories.


## 3. Analyze Each Neighborhood

In [32]:
# one hot encoding
etobicoke_onehot = pd.get_dummies(etobicoke_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
etobicoke_onehot['Neighbourhood'] = etobicoke_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [etobicoke_onehot.columns[-1]] + list(etobicoke_onehot.columns[:-1])
etobicoke_onehot = etobicoke_onehot[fixed_columns]

etobicoke_onehot.head()

Unnamed: 0,Neighbourhood,American Restaurant,Bakery,Bar,Baseball Field,Beer Store,Burger Joint,Café,Chinese Restaurant,Coffee Shop,Convenience Store,Dance Studio,Discount Store,Drugstore,Fast Food Restaurant,Fried Chicken Joint,Garden Center,Grocery Store,Gym,Hardware Store,Home Service,Intersection,Kids Store,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place,Pool,Pub,Rental Car Location,Restaurant,River,Sandwich Place,Shopping Plaza,Smoke Shop,Supplement Shop,Tanning Salon,Truck Stop,Wings Joint
0,"New Toronto, Mimico South, Humber Bay Shores",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"New Toronto, Mimico South, Humber Bay Shores",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,"New Toronto, Mimico South, Humber Bay Shores",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
3,"New Toronto, Mimico South, Humber Bay Shores",0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"New Toronto, Mimico South, Humber Bay Shores",0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [33]:
#Let's confirm the new size
etobicoke_onehot.shape

(75, 43)

In [34]:
#let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
etobicoke_grouped = etobicoke_onehot.groupby('Neighbourhood').mean().reset_index()
etobicoke_grouped

Unnamed: 0,Neighbourhood,American Restaurant,Bakery,Bar,Baseball Field,Beer Store,Burger Joint,Café,Chinese Restaurant,Coffee Shop,Convenience Store,Dance Studio,Discount Store,Drugstore,Fast Food Restaurant,Fried Chicken Joint,Garden Center,Grocery Store,Gym,Hardware Store,Home Service,Intersection,Kids Store,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place,Pool,Pub,Rental Car Location,Restaurant,River,Sandwich Place,Shopping Plaza,Smoke Shop,Supplement Shop,Tanning Salon,Truck Stop,Wings Joint
0,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.222222,0.111111,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0
1,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.111111,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0
2,"Kingsview Village, St. Phillips, Martin Grove ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0
3,"Mimico NW, The Queensway West, South of Bloor,...",0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.076923,0.0,0.076923,0.0,0.0,0.076923,0.076923,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.076923,0.076923,0.0,0.076923
4,"New Toronto, Mimico South, Humber Bay Shores",0.083333,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.083333,0.083333,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Northwest, West Humber - Clairville",0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0
6,"Old Mill South, King's Mill Park, Sunnylea, Hu...",0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"South Steeles, Silverstone, Humbergate, Jamest...",0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.2,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0
8,"The Kingsway, Montgomery Road, Old Mill North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.333333,0.0,0.0,0.0,0.0
9,"West Deane Park, Princess Gardens, Martin Grov...",0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [35]:
#Let's confirm the new size
etobicoke_grouped.shape

(11, 43)

In [36]:
#Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in etobicoke_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = etobicoke_grouped[etobicoke_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alderwood, Long Branch----
            venue  freq
0     Pizza Place  0.22
1             Pub  0.11
2             Gym  0.11
3        Pharmacy  0.11
4  Sandwich Place  0.11


----Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood----
          venue  freq
0  Liquor Store  0.11
1    Beer Store  0.11
2     Pet Store  0.11
3          Café  0.11
4      Pharmacy  0.11


----Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens----
                 venue  freq
0    Mobile Phone Shop  0.33
1                 Park  0.33
2       Sandwich Place  0.33
3  American Restaurant  0.00
4                  Pub  0.00


----Mimico NW, The Queensway West, South of Bloor, Kingsway Park South West, Royal York South West----
             venue  freq
0       Kids Store  0.08
1   Discount Store  0.08
2    Tanning Salon  0.08
3  Supplement Shop  0.08
4   Sandwich Place  0.08


----New Toronto, Mimico South, Humber Bay Shores----
                 venue  freq
0  American Restaurant  0.

In [37]:
#Let's put that into a pandas dataframe
#First, let's write a function to sort the venues in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [39]:
#let's create the new dataframe and display the top 10 venues for each neighborhood.
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = etobicoke_grouped['Neighbourhood']

for ind in np.arange(etobicoke_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(etobicoke_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()




Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Alderwood, Long Branch",Pizza Place,Pub,Gym,Pharmacy,Sandwich Place,Coffee Shop,Dance Studio,Pool,American Restaurant,Mobile Phone Shop
1,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",Liquor Store,Beer Store,Pet Store,Café,Pharmacy,Coffee Shop,Convenience Store,Shopping Plaza,Pizza Place,Rental Car Location
2,"Kingsview Village, St. Phillips, Martin Grove ...",Mobile Phone Shop,Park,Sandwich Place,American Restaurant,Pub,Mexican Restaurant,Middle Eastern Restaurant,Pet Store,Pharmacy,Pizza Place
3,"Mimico NW, The Queensway West, South of Bloor,...",Kids Store,Discount Store,Tanning Salon,Supplement Shop,Sandwich Place,Bakery,Hardware Store,Gym,Grocery Store,Fast Food Restaurant
4,"New Toronto, Mimico South, Humber Bay Shores",American Restaurant,Restaurant,Liquor Store,Bakery,Pet Store,Pharmacy,Gym,Pizza Place,Fast Food Restaurant,Mexican Restaurant


# 4. Cluster Neighborhoods

In [62]:
!pip3 install KMeans
from sklearn.cluster import KMeans



In [63]:
# set number of clusters
kclusters = 5

etobicoke_grouped_clustering = etobicoke_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(etobicoke_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 4, 1, 1, 1, 2, 1, 3, 0], dtype=int32)

In [67]:
neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,"Alderwood, Long Branch",Pizza Place,Pub,Gym,Pharmacy,Sandwich Place,Coffee Shop,Dance Studio,Pool,American Restaurant,Mobile Phone Shop
1,1,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",Liquor Store,Beer Store,Pet Store,Café,Pharmacy,Coffee Shop,Convenience Store,Shopping Plaza,Pizza Place,Rental Car Location
2,4,"Kingsview Village, St. Phillips, Martin Grove ...",Mobile Phone Shop,Park,Sandwich Place,American Restaurant,Pub,Mexican Restaurant,Middle Eastern Restaurant,Pet Store,Pharmacy,Pizza Place
3,1,"Mimico NW, The Queensway West, South of Bloor,...",Kids Store,Discount Store,Tanning Salon,Supplement Shop,Sandwich Place,Bakery,Hardware Store,Gym,Grocery Store,Fast Food Restaurant
4,1,"New Toronto, Mimico South, Humber Bay Shores",American Restaurant,Restaurant,Liquor Store,Bakery,Pet Store,Pharmacy,Gym,Pizza Place,Fast Food Restaurant,Mexican Restaurant
5,1,"Northwest, West Humber - Clairville",Rental Car Location,Drugstore,Bar,Truck Stop,Garden Center,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place
6,2,"Old Mill South, King's Mill Park, Sunnylea, Hu...",Baseball Field,American Restaurant,Rental Car Location,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place,Pool
7,1,"South Steeles, Silverstone, Humbergate, Jamest...",Pizza Place,Grocery Store,Fast Food Restaurant,Beer Store,Pharmacy,Coffee Shop,Sandwich Place,Fried Chicken Joint,American Restaurant,Park
8,3,"The Kingsway, Montgomery Road, Old Mill North",Pool,Smoke Shop,River,American Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy
9,0,"West Deane Park, Princess Gardens, Martin Grov...",Bakery,Home Service,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place


In [68]:
#Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

etobicoke_merged = etobicoke_data

# merge etobicoke_grouped with manhattan_data to add latitude/longitude for each neighborhood
etobicoke_merged = etobicoke_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

etobicoke_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M8V,Etobicoke,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,1.0,American Restaurant,Restaurant,Liquor Store,Bakery,Pet Store,Pharmacy,Gym,Pizza Place,Fast Food Restaurant,Mexican Restaurant
1,M8W,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,1.0,Pizza Place,Pub,Gym,Pharmacy,Sandwich Place,Coffee Shop,Dance Studio,Pool,American Restaurant,Mobile Phone Shop
2,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,3.0,Pool,Smoke Shop,River,American Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy
3,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,2.0,Baseball Field,American Restaurant,Rental Car Location,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place,Pool
4,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,1.0,Kids Store,Discount Store,Tanning Salon,Supplement Shop,Sandwich Place,Bakery,Hardware Store,Gym,Grocery Store,Fast Food Restaurant


In [87]:
#Finally, let's visualize the resulting clusters

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(etobicoke_merged['Latitude'], etobicoke_merged['Longitude'], etobicoke_merged['Neighbourhood'], etobicoke_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=30,
        popup=label,
        #color=rainbow[cluster-1],
        color='#FF0700',
        fill=True,
        #fill_color=rainbow[cluster-1],
        fill_color='#087FBF',
        fill_opacity=0.7).add_to(map_clusters)       
map_clusters

# 5. Examine Clusters

In [90]:
#examine each cluster and determine the discriminating venue categories that distinguish each cluster. 
#Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

#Cluster 1
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 0, etobicoke_merged.columns[[1] + list(range(5, etobicoke_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Etobicoke,0.0,Bakery,Home Service,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place


In [91]:
#Cluster 2
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 1, etobicoke_merged.columns[[1] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Etobicoke,1.0,American Restaurant,Restaurant,Liquor Store,Bakery,Pet Store,Pharmacy,Gym,Pizza Place,Fast Food Restaurant,Mexican Restaurant
1,Etobicoke,1.0,Pizza Place,Pub,Gym,Pharmacy,Sandwich Place,Coffee Shop,Dance Studio,Pool,American Restaurant,Mobile Phone Shop
4,Etobicoke,1.0,Kids Store,Discount Store,Tanning Salon,Supplement Shop,Sandwich Place,Bakery,Hardware Store,Gym,Grocery Store,Fast Food Restaurant
7,Etobicoke,1.0,Liquor Store,Beer Store,Pet Store,Café,Pharmacy,Coffee Shop,Convenience Store,Shopping Plaza,Pizza Place,Rental Car Location
8,Etobicoke,1.0,Pizza Place,Discount Store,Chinese Restaurant,Intersection,Sandwich Place,Coffee Shop,Middle Eastern Restaurant,Shopping Plaza,Smoke Shop,Supplement Shop
10,Etobicoke,1.0,Pizza Place,Grocery Store,Fast Food Restaurant,Beer Store,Pharmacy,Coffee Shop,Sandwich Place,Fried Chicken Joint,American Restaurant,Park
11,Etobicoke,1.0,Rental Car Location,Drugstore,Bar,Truck Stop,Garden Center,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place


In [92]:
#Cluster 3
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 2, etobicoke_merged.columns[[1] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Etobicoke,2.0,Baseball Field,American Restaurant,Rental Car Location,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy,Pizza Place,Pool


In [93]:
#Cluster 4
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 3, etobicoke_merged.columns[[1] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Etobicoke,3.0,Pool,Smoke Shop,River,American Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Park,Pet Store,Pharmacy


In [94]:
#Cluster 5
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 4, etobicoke_merged.columns[[1] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Etobicoke,4.0,Mobile Phone Shop,Park,Sandwich Place,American Restaurant,Pub,Mexican Restaurant,Middle Eastern Restaurant,Pet Store,Pharmacy,Pizza Place
