<a href="https://cognitiveclass.ait"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a> 
<h1 align=center><font size = 6>Battle of the Neighborhoods: Toronto</font></h1>

## Introduction

The following notebook takes a list of neighborhoods within Toronto from Wikipedia, and creates a series of visualization clusters using a combination of web scrapping, coordinate points, FourSquare API requests, and machine learning modeling. By the end, you would have created a series of neighborhood clusters for one of Toronto's districts, and will be able to view and visualize which neighborhood belongs to which cluster.

## Creating the Dataframe

To start off, we must gather our data on Toronto Neighborhoods and place it into a <em>pandas</em> dataframe

To accomplish this task, we will do the following steps:
-   Webscrape the URL of the wikipedia page using the BeautifulSoup package and save it to a Soup object 
-   Create a list object to later contain all the contents of the table from the wikipedia page
-   Locate the table in the Soup object
-   Loop through the information within the table placing each individual table cell to a newly created dictionary object and append it to the list object we created earlier
-  Create the <em>pandas</em> dataframe using the list object with all the information obtained from the previous step


### Import Packages

Start by importing the neccesary Python packages for this project.

In [74]:
import pandas as pd
import numpy as np
import requests
import urllib.request
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans 
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors

from bs4 import BeautifulSoup

### Webscrapping

Using the <code>urllib.request</code> package, open up and save the contents of the URL below to an object called <code>page</code>. Then parse through the contents of the URL using the <code>BeautifulSoup</code> package.

In [75]:
url = "https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M"

page = urllib.request.urlopen(url)

soup = BeautifulSoup(page, "lxml")

print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"a1be4833-379c-4f2f-a229-bafce4147efd","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":1013111980,"wgRevisionId":1013111980,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Wikipedia

### Locating our Table

Next, we will need to locate the table within our <code>soup</code> object and save it to a new object called <code>table</code>. This can simply be done using the <code>find</code> function within our <code>soup</code> object. Before doing that, make sure to create a list called <code>table_contents</code> that we will need to transfer all the proper contents of our table shortly.

In [76]:
table_contents = []
table = soup.find('table')
print(table.prettify())

<table cellpadding="2" cellspacing="0" rules="all" style="width:100%; border-collapse:collapse; border:1px solid #ccc;">
 <tbody>
  <tr>
   <td style="width:11%; vertical-align:top; color:#ccc;">
    <p>
     <b>
      M1A
     </b>
     <br/>
     <span style="font-size:85%;">
      <i>
       Not assigned
      </i>
     </span>
    </p>
   </td>
   <td style="width:11%; vertical-align:top; color:#ccc;">
    <p>
     <b>
      M2A
     </b>
     <br/>
     <span style="font-size:85%;">
      <i>
       Not assigned
      </i>
     </span>
    </p>
   </td>
   <td style="width:11%; vertical-align:top;">
    <p>
     <b>
      M3A
     </b>
     <br/>
     <span style="font-size:85%;">
      <a href="/wiki/North_York" title="North York">
       North York
      </a>
      <br/>
      (
      <a href="/wiki/Parkwoods" title="Parkwoods">
       Parkwoods
      </a>
      )
     </span>
    </p>
   </td>
   <td style="width:11%; vertical-align:top;">
    <p>
     <b>
      M4A
     </b>
 

Using the following loop below, identify and locate each individual aspects of each table cell and append the information to <code>table_contents</code>.

In [77]:
for row in table.findAll('td'):
    cell = {}
    if row.span.text == "Not assigned":
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)
        
print(table_contents)

[{'PostalCode': 'M3A', 'Borough': 'North York', 'Neighborhood': 'Parkwoods'}, {'PostalCode': 'M4A', 'Borough': 'North York', 'Neighborhood': 'Victoria Village'}, {'PostalCode': 'M5A', 'Borough': 'Downtown Toronto', 'Neighborhood': 'Regent Park, Harbourfront'}, {'PostalCode': 'M6A', 'Borough': 'North York', 'Neighborhood': 'Lawrence Manor, Lawrence Heights'}, {'PostalCode': 'M7A', 'Borough': "Queen's Park", 'Neighborhood': 'Ontario Provincial Government'}, {'PostalCode': 'M9A', 'Borough': 'Etobicoke', 'Neighborhood': 'Islington Avenue'}, {'PostalCode': 'M1B', 'Borough': 'Scarborough', 'Neighborhood': 'Malvern, Rouge'}, {'PostalCode': 'M3B', 'Borough': 'North York', 'Neighborhood': 'Don Mills North'}, {'PostalCode': 'M4B', 'Borough': 'East York', 'Neighborhood': 'Parkview Hill, Woodbine Gardens'}, {'PostalCode': 'M5B', 'Borough': 'Downtown Toronto', 'Neighborhood': 'Garden District, Ryerson'}, {'PostalCode': 'M6B', 'Borough': 'North York', 'Neighborhood': 'Glencairn'}, {'PostalCode': 'M9

### Creating our <em>Pandas</em> DataFrame

Finally, create your <em>pandas</em> dataframe <code>df</code> using the contents of your <code>table_contents</code> object.

In [78]:
df = pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

df.head(20)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [79]:
df.shape

(103, 3)

And there you have it! You've have now created your primary dataframe! We will be referring back to this dataframe quite a bit, as we move forward with the project.

## Importing Coordinates

Next, we must import the exact location of each neighborhood in Toronto. For the purposes of convience, we will use a csv file provided by the instructors that contains the latitude and longitude coordinates of each individual neighborhood in Toronto.

To accomplish this, we will use the following steps:
-  Download and import Geospatial_Coordinates.csv into your project before moving forward 
-  Import the Project package from project_lib
-  Use the project_get_file() function to locate your csv file
-  Then simply read the file into a <em>pandas</em> dataframe
-  Finally, merge the two dataframes to create our final dataframe

### Locating and importing our Coordinates file

As stated above, make sure to download and import the Geospatial_Coordinates.csv file into your project. Once you have the file imported into your project, import <code>Project</code> from the <code>project_lib</code> package and create a new object using <code>Project</code>. To initialize <code>Project</code>, you will need to go into your Project's settings and copy over your Project I.D. and Project Access Token (i.e. Create one if haven't already) into the parameters of the newly created object. From there, you can easily retrieve the file using the <code>get_file</code> function, and transfer the contents over to a new <em>pandas</em> dataframe we'll call <code>df_coords</code>. 


In [80]:
# The code was removed by Watson Studio for sharing.

In [81]:
my_file = project.get_file("Geospatial_Coordinates.csv")
my_file.seek(0)

df_coords = pd.read_csv(my_file)
df_coords = df_coords.rename(columns = {"Postal Code": "PostalCode"})
df_coords.head(20)

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


### Merging our Dataframes

Use an inner join to combine all the information within our coordinates dataframe,<code>df_coords</code>, with our primary dataframe <code>df</code>.

In [82]:
df = df.merge(df_coords, on = "PostalCode", how = "inner")
df.head(20)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [83]:
df.shape

(103, 5)

And there we have it! Our dataframe now contains all the information we will need for the next major portion of our project!

## Creating and Visualizing Clusters within Toronto

And now, we are now ready to visualize the city of Toronto and create a series of clusters within one of its primary regions. Because this is a multi-step process that would simply be too large for a quick summary bulletpoint list, I will personally break down each individual step up until the end of the project. 

Without further ado, let's begin!

We can start off by dropping the **PostalCode** column from our dataframe, as we'll no longer need it for the rest of the project.

In [84]:
df = df.drop('PostalCode', 1)
df.head(20)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,North York,Parkwoods,43.753259,-79.329656
1,North York,Victoria Village,43.725882,-79.315572
2,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
5,Etobicoke,Islington Avenue,43.667856,-79.532242
6,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,North York,Don Mills North,43.745906,-79.352188
8,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


### Visualizing Toronto

Before we can start thinking about our clusters, we need to visualize the city of Toronto and the location of Toronto's neighborhoods within the city. Before moving on, make sure to install and import the <code>folium</code> package.

In [85]:
!pip install folium
import folium

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


Using the <code>Nominatim</code> API from <code>geopy</code>, create a <code>geolocator</code> object, and retrieve the exact coordinates of the city of Toronto using the <code>geocode</code> function. Save the location of the city of Toronto to an object called <code>location</code>, and also save the indivdual latitude and longitude of Toronto to two more objects of the same name. 

In [86]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent = 'tor_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


Using the <code>folium</code> package, create a map of Toronto using the coordinate objects you created previously. Additionally, loop through our dataframe <code>df</code>, and insert both a circle marker and a popup label for each individual neighborhood into our map using the <code>CircleMarker</code> function from <code>folium</code>.

Feel free to explore our map and identify where each of the major broughs and neighborhoods of Toronto are located. We will shortly chose one of these broughs as the staging grounds for our clusters.

In [87]:
map_toronto = folium.Map(location = [latitude, longitude], zoom_start = 10)

for lat, long, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, long],
        radius = 5,
        popup = label,
        color = "blue",
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False
    ).add_to(map_toronto)
    
map_toronto

### Choosing and Visualizing our Borough

For the purposes of this project, we will use Downtown Toronto as the staging grounds for our clusters. Create a new <em>pandas</em> dataframe called <code>downtown_data</code> to isolate and hold only the neighborhoods inside Downtown Toronto.

In [88]:
downtown_data = df[df['Borough'] == 'Downtown Toronto'].reset_index(drop = True)
downtown_data.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,Downtown Toronto,St. James Town,43.651494,-79.375418
3,Downtown Toronto,Berczy Park,43.644771,-79.373306
4,Downtown Toronto,Central Bay Street,43.657952,-79.387383
5,Downtown Toronto,Christie,43.669542,-79.422564
6,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
7,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
8,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576
9,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817


Repeat the process from **Visualizing Toronto** above to now only visualize Downtown Toronto and all of its neighborhoods.

In [89]:
address = 'Downtown Toronto, Ontario'

geolocator = Nominatim(user_agent = 'tor_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6563221, -79.3809161.


In [90]:
map_downtown = folium.Map(location = [latitude, longitude], zoom_start = 13)

for lat, long, borough, neighborhood in zip(downtown_data['Latitude'], downtown_data['Longitude'], downtown_data['Borough'], downtown_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, long],
        radius = 5,
        popup = label,
        color = "blue",
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False
    ).add_to(map_downtown)
    
map_downtown

### Retrieving Neighborhood Venues

Now having an idea of how the neighborhoods of Downtown Toronto are visualized, we can start the process of organizing these neighborhoods into groups of clusters. We will be seperating the neighborhoods into clusters via distance from nearby venues. To do this, we will require the location and description of nearby venues by these neighborhoods.

Fortunately for us, we can request and retrieve this information through the FourSquare API. If you haven't already, create a free account with FourSquare (https://foursquare.com/). Then locate your CLIENT I.D. and CLIENT SECRET within your account, and enter them in below.


In [91]:
# The code was removed by Watson Studio for sharing.

Having entered and saved our FourSquare API credientials, we are now ready to request and retrieve the information for any venues located near our neighborhoods.

For this portion, we will use a user-created function called <code>getNearbyVenues()</code> to retrieve all nearby venues within 1200 meters for each neighborhood in Downtown Toronto. Start off the function by creating a new list called <code>venues_list</code> that we will use to contain all the information for each venue retrieved from FourSquare. Then, we'll make a call to the FourSquare API to request the name, location, and category of every venue within 1200 meters for each of our neighborhoods. Use the information from <code>downtown_data</code> and your FourSquare account credentials we previously retrieved to make this call. Append all the information for each individual venue to <code>venues_list</code>. This process is repeated for every neighborhood in Downtown Toronto. 

Finally, create a new dataframe using the data contained in <code>venues_list</code> and return the new dataframe as a object.

We will call this newly created dataframe <code>downtown_venues</code>.

In [92]:
def getNearbyVenues(names, latitudes, longitudes, radius = 1200):
    
    venues_list = []
    
    for name, lat, long in zip(names, latitudes, longitudes):
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            long, 
            radius, 
            LIMIT)
        
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        venues_list.append([(
            name,
            lat,
            long,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    
    nearby_venues.columns = ['Neighborhood',
                             'Neighborhood Latitude',
                             'Neighborhood Longitude',
                             'Venue',
                             'Venue Latitude',
                             'Venue Longitude',
                             'Venue Category']
    
    return(nearby_venues)

In [93]:
downtown_venues = getNearbyVenues(names = downtown_data['Neighborhood'],
                                  latitudes = downtown_data['Latitude'],
                                  longitudes = downtown_data['Longitude'])

downtown_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
4,"Regent Park, Harbourfront",43.65426,-79.360636,The Distillery Historic District,43.650244,-79.359323,Historic Site


In [94]:
print(downtown_venues.shape)

(1581, 7)


Check to see how many venues were returned for each neighborhood.

In [95]:
downtown_venues.groupby("Neighborhood").count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,100,100,100,100,100,100
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",31,31,31,31,31,31
Central Bay Street,100,100,100,100,100,100
Christie,100,100,100,100,100,100
Church and Wellesley,100,100,100,100,100,100
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
"Harbourfront East, Union Station, Toronto Islands",100,100,100,100,100,100
"Kensington Market, Chinatown, Grange Park",100,100,100,100,100,100


Determine the number of unique venue categories in Downtown Toronto overall.

In [96]:
print("There are {} unique categories".format(len(downtown_venues['Venue Category'].unique())))

There are 199 unique categories


### Analyzing Neighborhood Venues

We'll now be creating a new dataframe contanining the encodement of every venue category for each of our neighborhoods in Downtown Toronto called <code>downtown_oneshot</code>. Use the function <code>get_dummies()</code> to accomplish this. Import the column containing the name of neighborhoods in <code>downtown_venues</code>, and pop to the front of <code>downtown_oneshot</code>.  

In [97]:
downtown_oneshot = pd.get_dummies(downtown_venues[['Venue Category']], prefix = "", prefix_sep = "")

downtown_oneshot['Neighborhood'] = downtown_venues['Neighborhood']

cols = downtown_oneshot.columns.tolist()
cols.insert(0, cols.pop(cols.index('Neighborhood')))

downtown_oneshot = downtown_oneshot.reindex(columns = cols)

downtown_oneshot.head(10)

Unnamed: 0,Neighborhood,Airport,Airport Lounge,American Restaurant,Animal Shelter,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Track,Trail,Train Station,University,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [98]:
downtown_oneshot.shape

(1581, 199)

Then, create a new dataframe called <code>downtown_grouped</code> containing the mean of the frequency of occurance of each venue category for every neighborhood in Downtown Toronto. 

Feel free to explore the results of the frequenies of each venue category for every neighborhood to your heart's content before moving on.

In [99]:
downtown_grouped = downtown_oneshot.groupby('Neighborhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighborhood,Airport,Airport Lounge,American Restaurant,Animal Shelter,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Track,Trail,Train Station,University,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Berczy Park,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,...,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.032258,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258
2,Central Bay Street,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,...,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.02
3,Christie,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,...,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0
4,Church and Wellesley,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02
5,"Commerce Court, Victoria Hotel",0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0
6,"First Canadian Place, Underground city",0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,...,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01
7,"Garden District, Ryerson",0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,...,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.02,...,0.0,0.0,0.0,0.01,0.05,0.0,0.02,0.01,0.0,0.03


In [100]:
downtown_grouped.shape

(17, 199)

### Finding the 10 Most Common Venues for each Neighborhood

Next, we'll write up a new function called <code>return_most_common_venues()</code> to return the number of the most common venues for each neighborhood within <code>downtown_grouped</code>. Before implementing the function, we'll need to create a new data list called <code>columns</code> where we'll fill in the titles for each position of the number of common venues up until the value we'll set in the function arguments for <code>return_most_common_values()</code>. Create and initialize a new dataframe called <code>neighborhoods_venues_sorted</code> using our <code>columns</code> object. Now, implement our newly created function to loop through each neighborhood in <code>downtown_grouped</code> and return a list containing each neighborhoods's 10 most common venues. 

Feel free to explore the 10 most common venues for any neighborhood in Downtown Toronto to your heart's content before moving on to our final step.

In [101]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [102]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
neighborhoods_venues_sorted = pd.DataFrame(columns = columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighborhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues) 
    
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Café,Hotel,Park,Beer Bar,Restaurant,Farmers Market,Plaza,Japanese Restaurant,Gastropub
1,"CN Tower, King and Spadina, Railway Lands, Har...",Café,Park,Coffee Shop,Harbor / Marina,Gym,Track,Yoga Studio,Garden,Scenic Lookout,Ramen Restaurant
2,Central Bay Street,Coffee Shop,Japanese Restaurant,Bubble Tea Shop,Pizza Place,Café,Park,Yoga Studio,Bookstore,Plaza,Burrito Place
3,Christie,Korean Restaurant,Café,Coffee Shop,Grocery Store,Park,Cocktail Bar,Italian Restaurant,Bar,Comedy Club,Eastern European Restaurant
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Park,Italian Restaurant,Sushi Restaurant,Caribbean Restaurant,Restaurant,Ramen Restaurant,Burger Joint,Café


### Creating and Visualizing a K-Means Clustering Model

Finally, we've reach our final step! 

Creating and visualizing a series of neighborhood clusters in Downtown Toronto via a K-Means Clustering Model!

Starting off, create a new dataframe called <code>downtown_grouped_clustering</code> by dropping of the <code>'Neighborhood'</code> column from <code>downtown_grouped</code>. Now, initialize our k-mean clustering model called <code>kmeans</code> by fitting in the data of our neighborhoods within <code>downtown_grouped_clustering</code> into a series of 5 clusters.

In [103]:
kclusters = 5

downtown_grouped_clustering = downtown_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(downtown_grouped_clustering)

Before moving on to visualizing our model, perform the following steps:
- Insert a new column containing the cluster labels of <code>kmeans</code> into <code>neighborhoods_venues_sorted</code>.
- Create a new dataframe called <code>downtown_merged</code> by merging the dataframes <code>downtown_data</code> with <code>neighborhoods_venues_sorted</code>.

View the results of the newly merged dataset below.

In [104]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

downtown_merged = downtown_data

downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on = 'Neighborhood')

downtown_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Café,Park,Restaurant,Theater,Bakery,Gastropub,Pub,Farmers Market,Italian Restaurant
1,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Coffee Shop,Gastropub,Café,Pizza Place,Cosmetics Shop,Theater,Japanese Restaurant,Clothing Store,Diner,Bookstore
2,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Café,Gastropub,Restaurant,Italian Restaurant,Cosmetics Shop,Park,Japanese Restaurant,Seafood Restaurant,Plaza
3,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Coffee Shop,Café,Hotel,Park,Beer Bar,Restaurant,Farmers Market,Plaza,Japanese Restaurant,Gastropub
4,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Japanese Restaurant,Bubble Tea Shop,Pizza Place,Café,Park,Yoga Studio,Bookstore,Plaza,Burrito Place


At long last, we'll now visualize our neighborhoods using our <code>kmeans</code> model!

Employing <code>Folium</code> once again, create a map of Downtown Toronto using our latitude and longitude variables of Downtown Toronto from previous steps ago. Then, use our <code>kclusters</code> variable from above to set up the individual 5 colors for each cluster within our map.

Finally, loop through each neighborhood in <code>downtown_merged</code> and mark its placement on our map using its latitude, longitude, and cluster position. To accomplish this, we'll use the <code>CircleMarker</code> function once again to provide both a popup label and a colored-in circle marker, as the means of identifying each neighborhood's name, position, and which cluster it belongs to. By the time we boot up our map, we should see 5 different clusters marked across Downtown Toronto.

Feel free to explore our new map and what neighborhood belongs to which cluster!

In [105]:
map_clusters = folium.Map(location = [latitude, longitude], zoom_start = 13)

x = np.arange(kclusters)
ys = [i * x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

for lat, long, poi, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], downtown_merged['Neighborhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat, long],
        radius = 5,
        popup = label,
        color = rainbow[cluster-1],
        fill = True,
        fill_color = rainbow[cluster-1],
        fill_opacity = 0.7).add_to(map_clusters)
    
map_clusters

## Examing Clusters

Having formed our clusters in Downtown Toronto, we can now examine every neighborhood for each cluster, and see which neighborhoods fall into which cluster as well as which venues are closest to those neighborhoods. Using the <code>'Cluster Labels'</code> column in <code>downtown_merged</code>, we can isolate each cluster, and present the information for every neighborhood within each cluster below.

#### Cluster 1 - Red Markers

In [106]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 0, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Garden District, Ryerson",Coffee Shop,Gastropub,Café,Pizza Place,Cosmetics Shop,Theater,Japanese Restaurant,Clothing Store,Diner,Bookstore
4,Central Bay Street,Coffee Shop,Japanese Restaurant,Bubble Tea Shop,Pizza Place,Café,Park,Yoga Studio,Bookstore,Plaza,Burrito Place
14,"St. James Town, Cabbagetown",Coffee Shop,Park,Café,Thai Restaurant,Gastropub,Diner,Restaurant,Japanese Restaurant,Pizza Place,Dance Studio
16,Church and Wellesley,Coffee Shop,Japanese Restaurant,Park,Italian Restaurant,Sushi Restaurant,Caribbean Restaurant,Restaurant,Ramen Restaurant,Burger Joint,Café


#### Cluster 2 -  Purple Marker

In [107]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 1, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",Coffee Shop,Café,Park,Restaurant,Theater,Bakery,Gastropub,Pub,Farmers Market,Italian Restaurant
2,St. James Town,Coffee Shop,Café,Gastropub,Restaurant,Italian Restaurant,Cosmetics Shop,Park,Japanese Restaurant,Seafood Restaurant,Plaza
3,Berczy Park,Coffee Shop,Café,Hotel,Park,Beer Bar,Restaurant,Farmers Market,Plaza,Japanese Restaurant,Gastropub
6,"Richmond, Adelaide, King",Coffee Shop,Gastropub,Café,Cosmetics Shop,Restaurant,Plaza,Park,Theater,Sandwich Place,Japanese Restaurant
7,"Harbourfront East, Union Station, Toronto Islands",Coffee Shop,Café,Hotel,Park,Brewery,Japanese Restaurant,Scenic Lookout,Gym,Theater,Plaza
8,"Toronto Dominion Centre, Design Exchange",Café,Coffee Shop,Hotel,Japanese Restaurant,Restaurant,Beer Bar,Cosmetics Shop,Theater,Park,Plaza
9,"Commerce Court, Victoria Hotel",Coffee Shop,Café,Hotel,Restaurant,Theater,Seafood Restaurant,Beer Bar,Japanese Restaurant,Gastropub,Cosmetics Shop
15,"First Canadian Place, Underground city",Coffee Shop,Café,Gastropub,Japanese Restaurant,Plaza,Park,Theater,Art Gallery,Hotel,Monument / Landmark


#### Cluster 3 - Light Blue

In [108]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 2, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Christie,Korean Restaurant,Café,Coffee Shop,Grocery Store,Park,Cocktail Bar,Italian Restaurant,Bar,Comedy Club,Eastern European Restaurant
10,"University of Toronto, Harbord",Café,Vegetarian / Vegan Restaurant,Bakery,Mexican Restaurant,Bar,Beer Bar,Coffee Shop,Bookstore,Restaurant,Japanese Restaurant
11,"Kensington Market, Chinatown, Grange Park",Café,Vegetarian / Vegan Restaurant,Coffee Shop,Bar,Yoga Studio,Park,Art Gallery,Mexican Restaurant,Ramen Restaurant,Pizza Place


#### Cluster 4 - Aqua Green Markers

In [109]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 3, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,"CN Tower, King and Spadina, Railway Lands, Har...",Café,Park,Coffee Shop,Harbor / Marina,Gym,Track,Yoga Studio,Garden,Scenic Lookout,Ramen Restaurant


#### Cluster 5 - Tan Markers

In [110]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 4, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Rosedale,Park,Coffee Shop,Italian Restaurant,Grocery Store,Café,Bank,Metro Station,Juice Bar,Bar,Japanese Restaurant


### Thank You for Viewing this Project!

Written by John Muchinsky with aid from Alex Aklson and Polong Lin of the IBM Applied Data Science Capstone Course on Coursera