# CAPSTONE PROJECT - IBM Data Science Professional Certificate
## Author: Eduardo Gaona P.

This Jupyter notebook will serve as the main platform to solve all the tasks from the capstone project of the IBM Data Science Professional Certificate

## Importing the libraries

In [1]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests  # this module helps us to download a web page

Setting the URL for the table and using "request" to get the http information of it

In [2]:
url_wiki = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

Using BeautifulSoup to get the "table" object

In [3]:
data  = requests.get(url_wiki).text
soup = BeautifulSoup(data,"html5lib")
tables = soup.find_all('table')[0] # in html table is represented by the tag <table>

Showing an example of an entry of the table

In [4]:
tables.find_all("td")[54]

<td style="vertical-align:top;">
<p>M1J<br/><span style="font-size:85%;"><a href="/wiki/Scarborough,_Toronto" title="Scarborough, Toronto">Scarborough</a><br/>(<a href="/wiki/Scarborough_Village" title="Scarborough Village">Scarborough Village</a>)</span>
</p>
</td>

In [5]:
tables.find_all("td")[54].text

'\nM1JScarborough(Scarborough Village)\n\n'

Going through all the entries, extracting the text and forming the DataFrame

In [6]:
postal_codes = pd.DataFrame(columns=["PostalCode", "Borough", "Neighborhood"])

for PostalCode in tables.find_all("td"):
  if 'Not assigned' not in PostalCode.text:
    Code = PostalCode.text.replace('\n','')[0:3]
    Borough = PostalCode.text.replace('\n','')[3:PostalCode.text.replace('\n','').find('(')]
    Neighborhood = PostalCode.text.replace('\n','')[PostalCode.text.replace('\n','').find('(')+1:-1].replace(' / ',',')
    postal_codes = postal_codes.append({"PostalCode":Code, "Borough":Borough, "Neighborhood":Neighborhood}, ignore_index=True)

postal_codes.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park,Harbourfront"
3,M6A,North York,"Lawrence Manor,Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


In [7]:
postal_codes[postal_codes['PostalCode'] == 'M5G']

Unnamed: 0,PostalCode,Borough,Neighborhood
24,M5G,Downtown Toronto,Central Bay Street


In [8]:
postal_codes.shape

(103, 3)

## Adding the longitude and latitude of the neighborhoods

### Using the .csv file:

In [9]:
pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[?25l[K     |███▎                            | 10 kB 25.3 MB/s eta 0:00:01[K     |██████▋                         | 20 kB 28.8 MB/s eta 0:00:01[K     |██████████                      | 30 kB 16.0 MB/s eta 0:00:01[K     |█████████████▎                  | 40 kB 12.0 MB/s eta 0:00:01[K     |████████████████▋               | 51 kB 5.3 MB/s eta 0:00:01[K     |████████████████████            | 61 kB 5.7 MB/s eta 0:00:01[K     |███████████████████████▎        | 71 kB 5.8 MB/s eta 0:00:01[K     |██████████████████████████▋     | 81 kB 5.9 MB/s eta 0:00:01[K     |██████████████████████████████  | 92 kB 6.1 MB/s eta 0:00:01[K     |████████████████████████████████| 98 kB 3.4 MB/s 
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [10]:
Path_csv_loc = '/content/drive/MyDrive/Coursera/Geospatial_Coordinates.csv'
df_loc = pd.read_csv(Path_csv_loc)

In [11]:
df_loc.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [12]:
postal_codes_loc = postal_codes.set_index('PostalCode').join(df_loc.set_index('Postal Code')).reset_index()
postal_codes_loc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park,Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor,Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


In [13]:
postal_codes_loc[postal_codes_loc['PostalCode'] == 'M5G']

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383


### Using pgeocode to query for the location:

In [14]:
pip install pgeocode


Collecting pgeocode
  Downloading pgeocode-0.3.0-py3-none-any.whl (8.5 kB)
Installing collected packages: pgeocode
Successfully installed pgeocode-0.3.0


In [15]:
import pgeocode
nomi = pgeocode.Nominatim('ca')
postal_codes_loc2 = postal_codes.copy()
latitude_list = []
longitude_list = []
for post_code in postal_codes['PostalCode']:
  location = nomi.query_postal_code(post_code)
  latitude_list.append(location.latitude)
  longitude_list.append(location.longitude)
postal_codes_loc2["Latitude"] = latitude_list
postal_codes_loc2["Longitude"] = longitude_list
postal_codes_loc2.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,"Regent Park,Harbourfront",43.6555,-79.3626
3,M6A,North York,"Lawrence Manor,Lawrence Heights",43.7223,-79.4504
4,M7A,Queen's Park,Ontario Provincial Government,43.6641,-79.3889


Getting the not found coordinates from the .csv based DataFrame

In [16]:
postal_codes_loc2[postal_codes_loc2["Latitude"].isna()]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
76,M7R,MississaugaCanada Post Gateway Processing Centre,Enclave of L4W,,


In [17]:
postal_codes_loc[postal_codes_loc["PostalCode"] == 'M7R']

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
76,M7R,MississaugaCanada Post Gateway Processing Centre,Enclave of L4W,43.636966,-79.615819


In [18]:
postal_codes_loc2[postal_codes_loc2["Latitude"].isna()] = postal_codes_loc[postal_codes_loc2["Latitude"].isna()]

In [19]:
postal_codes_loc2[postal_codes_loc2["PostalCode"] == 'M7R']

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
76,M7R,MississaugaCanada Post Gateway Processing Centre,Enclave of L4W,43.636966,-79.615819


## Exploring the neighborhoods 

In [20]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

In [21]:
location_Toronto = nomi.query_postal_code('M5A')
print("Toronto coordinates are: Latitude {}, Longitude {}".format(location_Toronto.latitude,location_Toronto.longitude))

Toronto coordinates are: Latitude 43.6555, Longitude -79.3626


In [22]:
# create map of New York using latitude and longitude values
toronto_map = folium.Map(location=[location_Toronto.latitude, location_Toronto.longitude], zoom_start=10)
# add markers to map
for lat, lng, borough, neighborhood in zip(postal_codes_loc2['Latitude'], postal_codes_loc2['Longitude'], postal_codes_loc2['Borough'], postal_codes_loc2['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map


In [23]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


We'll use the coordinates of Downtown Toronto to query for the locations in Toronto

In [24]:
postal_codes_loc2.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,"Regent Park,Harbourfront",43.6555,-79.3626
3,M6A,North York,"Lawrence Manor,Lawrence Heights",43.7223,-79.4504
4,M7A,Queen's Park,Ontario Provincial Government,43.6641,-79.3889


In [25]:
float(postal_codes_loc2[postal_codes_loc2["PostalCode"]=="M5A"].Latitude)

43.6555

In [26]:
CLIENT_ID = 'OHKSEBZWUUR5QM1CDJRHMY3O2ASA0DZOS1DGXSM2WM2IM3QY' # your Foursquare ID
CLIENT_SECRET = 'THKUAKVGUXJU5WFKYZ3JHQSG5PKDJXQ1U345KCUCDXYFA1BM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    float(postal_codes_loc2[postal_codes_loc2["PostalCode"]=="M5A"].Latitude), 
    float(postal_codes_loc2[postal_codes_loc2["PostalCode"]=="M5A"].Longitude), 
    radius, 
    LIMIT)
url # display URL

results = requests.get(url).json()
results


{'meta': {'code': 200, 'requestId': '60fe4432e3607e45a3d3db36'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-53b8466a498e83df908c3f21-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/coffeeshop_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1e0931735',
         'name': 'Coffee Shop',
         'pluralName': 'Coffee Shops',
         'primary': True,
         'shortName': 'Coffee Shop'}],
       'id': '53b8466a498e83df908c3f21',
       'location': {'address': '368 King St E',
        'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'crossStreet': 'at Trinity St',
        'distance': 225,
        'formattedAddress': ['368 King St E (at Trinity St)',
         'Toronto ON',
         'Canada'],
        'labeledLatLngs': [{'labe

In [27]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [44]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Tandem Coffee,Coffee Shop,43.653559,-79.361809
1,Roselle Desserts,Bakery,43.653447,-79.362017
2,Figs Breakfast & Lunch,Breakfast Spot,43.655675,-79.364503
3,The Yoga Lounge,Yoga Studio,43.655515,-79.364955
4,Sumach Espresso,Coffee Shop,43.658135,-79.359515


In [45]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
venues_Toronto = getNearbyVenues(postal_codes_loc2['Neighborhood'],postal_codes_loc2['Latitude'],postal_codes_loc2['Longitude'])

In [79]:
venues_Toronto.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.7545,-79.33,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.7545,-79.33,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.7545,-79.33,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.7276,-79.3148,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.7276,-79.3148,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [80]:
print('Number of categories found: {}'.format(len(venues_Toronto["Venue Category"].unique())))

Number of categories found: 261


There are some venues that are in fact neighborhoods. We can therefore drop them to avoid confusion

In [81]:
venues_Toronto[venues_Toronto["Venue Category"]=='Neighborhood']

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
429,The Beaches,43.6784,-79.2941,Upper Beaches,43.680563,-79.292869,Neighborhood
574,Central Bay Street,43.6564,-79.386,Downtown Toronto,43.653232,-79.385296,Neighborhood
722,"Richmond,Adelaide,King",43.6496,-79.3833,Downtown Toronto,43.653232,-79.385296,Neighborhood
1095,"Brockton,Parkdale Village,Exhibition Place",43.6383,-79.4301,Parkdale,43.640524,-79.4322,Neighborhood
1851,Enclave of M5E,43.6437,-79.3787,Harbourfront,43.639526,-79.380688,Neighborhood


In [82]:
venues_Toronto = venues_Toronto.drop(venues_Toronto[venues_Toronto["Venue Category"]=='Neighborhood'].index)

To be able to cluster the neighborhoods we will use assigned locations. In order to quantify them we can use the "one hot" coding and group by neighborhood.

In [83]:
venues_Toronto_quant = pd.concat([venues_Toronto['Neighborhood'],pd.get_dummies(venues_Toronto[['Venue Category']], prefix="", prefix_sep="")],axis=1)
venues_Toronto_quant.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,...,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Train,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


After having our one hot coded DataFrame we can then group by each neighborhood and take the mean for each categoery. This will give us a footprint of the how each neighborhood is characterized

In [86]:
venues_Toronto_quant= venues_Toronto_quant.groupby("Neighborhood").mean().reset_index()
venues_Toronto_quant

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,...,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Train,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt),0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor,Wilson Heights,Downsview North",0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park,Lawrence Manor East",0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,"Willowdale,Newtonbrook",0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
96,Woburn,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
97,Woodbine Heights,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
98,York Mills West,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Most common venues per Neighborhood

We can look at the 5 most common venues in each neighborhood:

In [123]:
num_venues = 10
dict_freq_venues = {}
for i,neighborhood_i in enumerate(venues_Toronto_quant["Neighborhood"]):
  values_neighborhood = venues_Toronto_quant.iloc[i].values[1:]
  dict_freq_venues[neighborhood_i] = [venues_Toronto_quant.iloc[i].index.to_list()[1:][x] for x in sorted(range(len(values_neighborhood)), key=lambda k: values_neighborhood[k])[::-1]][0:num_venues]

In [None]:
dict_freq_venues

Let´s create a DataFrame with the ten most common venues for each neighborhood

In [119]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Toronto_neighborhood_venues = pd.DataFrame(columns=columns)
Toronto_neighborhood_venues["Neighborhood"] = venues_Toronto_quant["Neighborhood"]



In [122]:
dict_freq_venues[Toronto_neighborhood_venues.iloc[0,0]]

['Skating Rink',
 'Latin American Restaurant',
 'Breakfast Spot',
 'Badminton Court',
 'Yoga Studio']

In [125]:
Toronto_neighborhood_venues.shape
for i,neighborhood in enumerate(Toronto_neighborhood_venues["Neighborhood"]):
  Toronto_neighborhood_venues.iloc[i,1:] = dict_freq_venues[neighborhood]

In [131]:
Toronto_neighborhood_venues.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt),Skating Rink,Latin American Restaurant,Breakfast Spot,Badminton Court,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store
1,"Alderwood,Long Branch",Sandwich Place,Pub,Pizza Place,Pharmacy,Gym,Convenience Store,Coffee Shop,Yoga Studio,Women's Store,Wings Joint
2,"Bathurst Manor,Wilson Heights,Downsview North",Pizza Place,Middle Eastern Restaurant,Mediterranean Restaurant,Grocery Store,Fried Chicken Joint,Deli / Bodega,Coffee Shop,Yoga Studio,Women's Store,Wings Joint
3,Bayview Village,Trail,Park,Gas Station,Dog Run,Construction & Landscaping,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar
4,"Bedford Park,Lawrence Manor East",Italian Restaurant,Sandwich Place,Coffee Shop,Thai Restaurant,Sushi Restaurant,Restaurant,Pub,Pizza Place,Pharmacy,Pet Store


We can join our DataFrames to include the geographical information of the Neighborhood

In [130]:
Toronto_complete = postal_codes_loc2.set_index('Neighborhood').join(Toronto_neighborhood_venues.set_index('Neighborhood')).reset_index()
Toronto_complete.head()

Unnamed: 0,Neighborhood,PostalCode,Borough,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,M3A,North York,43.7545,-79.33,Park,Food & Drink Shop,Fast Food Restaurant,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,Vietnamese Restaurant
1,Victoria Village,M4A,North York,43.7276,-79.3148,Portuguese Restaurant,Pizza Place,Park,Intersection,Hockey Arena,French Restaurant,Coffee Shop,Yoga Studio,Women's Store,Wings Joint
2,"Regent Park,Harbourfront",M5A,Downtown Toronto,43.6555,-79.3626,Coffee Shop,Breakfast Spot,Yoga Studio,Theater,Thai Restaurant,Sushi Restaurant,Spa,Restaurant,Pub,Playground
3,"Lawrence Manor,Lawrence Heights",M6A,North York,43.7223,-79.4504,Clothing Store,Coffee Shop,Women's Store,Shoe Store,Restaurant,Toy / Game Store,Sushi Restaurant,Sandwich Place,Men's Store,Jewelry Store
4,Ontario Provincial Government,M7A,Queen's Park,43.6641,-79.3889,Sushi Restaurant,Gym,Vegetarian / Vegan Restaurant,Theater,Ramen Restaurant,Persian Restaurant,Park,Mexican Restaurant,Martial Arts School,Japanese Restaurant


## Clustering the neighborhoods

Using our quantified DataFrame we can cluster our neighborhoods

In [178]:
# set number of clusters
kclusters = 4

toronto_clustering = venues_Toronto_quant.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 1, 3, 0, 0, 0, 0, 0, 0], dtype=int32)

Adding the cluster label to our quantified DataFrame and then joining it with our complete DataFrame of venues:

In [179]:
venues_Toronto_quant["Cluster"] = kmeans.labels_
venues_Toronto_quant

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,...,Smoothie Shop,Snack Place,Soccer Field,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Train,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Cluster
0,Agincourt),0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
1,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
2,"Bathurst Manor,Wilson Heights,Downsview North",0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
3,Bayview Village,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
4,"Bedford Park,Lawrence Manor East",0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,"Willowdale,Newtonbrook",0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2
96,Woburn,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
97,Woodbine Heights,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2
98,York Mills West,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3


In [180]:
Toronto_complete2 = Toronto_complete.set_index('Neighborhood').join(venues_Toronto_quant.set_index('Neighborhood')["Cluster"]).reset_index().dropna()

In [184]:
Toronto_complete2.head()

Unnamed: 0,Neighborhood,PostalCode,Borough,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
0,Parkwoods,M3A,North York,43.7545,-79.33,Park,Food & Drink Shop,Fast Food Restaurant,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,Vietnamese Restaurant,3.0
1,Victoria Village,M4A,North York,43.7276,-79.3148,Portuguese Restaurant,Pizza Place,Park,Intersection,Hockey Arena,French Restaurant,Coffee Shop,Yoga Studio,Women's Store,Wings Joint,0.0
2,"Regent Park,Harbourfront",M5A,Downtown Toronto,43.6555,-79.3626,Coffee Shop,Breakfast Spot,Yoga Studio,Theater,Thai Restaurant,Sushi Restaurant,Spa,Restaurant,Pub,Playground,0.0
3,"Lawrence Manor,Lawrence Heights",M6A,North York,43.7223,-79.4504,Clothing Store,Coffee Shop,Women's Store,Shoe Store,Restaurant,Toy / Game Store,Sushi Restaurant,Sandwich Place,Men's Store,Jewelry Store,0.0
4,Ontario Provincial Government,M7A,Queen's Park,43.6641,-79.3889,Sushi Restaurant,Gym,Vegetarian / Vegan Restaurant,Theater,Ramen Restaurant,Persian Restaurant,Park,Mexican Restaurant,Martial Arts School,Japanese Restaurant,0.0


In [181]:
# create map
map_clusters = folium.Map(location=[location_Toronto.latitude, location_Toronto.longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_complete2['Latitude'], Toronto_complete2['Longitude'], Toronto_complete2['Neighborhood'], Toronto_complete2['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Exploring the clusters

### Cluster 1:

In [192]:
Cluster1 = Toronto_complete2.loc[Toronto_complete2['Cluster'] == 0, Toronto_complete2.columns[[0] + list(range(5, Toronto_complete2.shape[1]))]]
Cluster1.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
1,Victoria Village,Portuguese Restaurant,Pizza Place,Park,Intersection,Hockey Arena,French Restaurant,Coffee Shop,Yoga Studio,Women's Store,Wings Joint,0.0
2,"Regent Park,Harbourfront",Coffee Shop,Breakfast Spot,Yoga Studio,Theater,Thai Restaurant,Sushi Restaurant,Spa,Restaurant,Pub,Playground,0.0
3,"Lawrence Manor,Lawrence Heights",Clothing Store,Coffee Shop,Women's Store,Shoe Store,Restaurant,Toy / Game Store,Sushi Restaurant,Sandwich Place,Men's Store,Jewelry Store,0.0
4,Ontario Provincial Government,Sushi Restaurant,Gym,Vegetarian / Vegan Restaurant,Theater,Ramen Restaurant,Persian Restaurant,Park,Mexican Restaurant,Martial Arts School,Japanese Restaurant,0.0
9,"Garden District, Ryerson",Coffee Shop,Clothing Store,Café,Japanese Restaurant,Hotel,Cosmetics Shop,Theater,Tanning Salon,Ramen Restaurant,Pizza Place,0.0


In [197]:
Cluster1['1st Most Common Venue'].value_counts()

Coffee Shop                   11
Café                           5
Park                           5
Sushi Restaurant               4
Trail                          3
Hotel                          3
Skating Rink                   2
Clothing Store                 2
Italian Restaurant             2
Rental Service                 1
Sports Bar                     1
Greek Restaurant               1
Japanese Restaurant            1
Pub                            1
Baseball Field                 1
Grocery Store                  1
Bar                            1
Women's Store                  1
Construction & Landscaping     1
Tennis Court                   1
Ramen Restaurant               1
Brewery                        1
Burrito Place                  1
Portuguese Restaurant          1
Shoe Store                     1
Intersection                   1
Light Rail Station             1
Restaurant                     1
Name: 1st Most Common Venue, dtype: int64

This cluster is for neighborhoods with coffe shops 

### Cluster 2:

In [193]:
Cluster2 = Toronto_complete2.loc[Toronto_complete2['Cluster'] == 1, Toronto_complete2.columns[[0] + list(range(5, Toronto_complete2.shape[1]))]]
Cluster2.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
6,"Malvern,Rouge",Home Service,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,Vietnamese Restaurant,Video Store,Video Game Store,1.0
8,"Parkview Hill,Woodbine Gardens",Pizza Place,Pharmacy,Pet Store,Intersection,Gym / Fitness Center,Gastropub,Flea Market,Café,Breakfast Spot,Bank,1.0
10,Glencairn,Pizza Place,Mediterranean Restaurant,Latin American Restaurant,Japanese Restaurant,Ice Cream Shop,Grocery Store,Gas Station,Fast Food Restaurant,Bakery,Asian Restaurant,1.0
11,"West Deane Park,Princess Gardens,Martin Grove,...",Pizza Place,Tea Room,Sandwich Place,Coffee Shop,Chinese Restaurant,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,1.0
17,"Eringate,Bloordale Gardens,Old Burnhamthorpe,M...",Shopping Plaza,Pizza Place,Pharmacy,Pet Store,Liquor Store,Coffee Shop,Café,Beer Store,Yoga Studio,Women's Store,1.0


In [198]:
Cluster2['1st Most Common Venue'].value_counts()

Pizza Place              6
Home Service             2
Supermarket              2
Sandwich Place           2
Convenience Store        2
Nightclub                1
Fast Food Restaurant     1
Shopping Mall            1
Shopping Plaza           1
Korean BBQ Restaurant    1
Ice Cream Shop           1
Grocery Store            1
Indian Restaurant        1
Coffee Shop              1
Name: 1st Most Common Venue, dtype: int64

This cluster is for neighborhoods with pizza places

### Cluster 3:

In [199]:
Cluster3 = Toronto_complete2.loc[Toronto_complete2['Cluster'] == 2, Toronto_complete2.columns[[0] + list(range(5, Toronto_complete2.shape[1]))]]
Cluster3.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
5,Islington Avenue,Pharmacy,Skating Rink,Park,Grocery Store,Bank,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,2.0
14,Woodbine Heights,Video Store,Spa,Restaurant,Playground,Convenience Store,Beer Store,Yoga Studio,Women's Store,Wings Joint,Wine Bar,2.0
27,Hillcrest Village,Residential Building (Apartment / Condo),Park,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,Vietnamese Restaurant,Video Store,2.0
52,"Willowdale,Newtonbrook",Playground,Electronics Store,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,Vietnamese Restaurant,Video Store,2.0
69,"High Park,The Junction South",Residential Building (Apartment / Condo),Park,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,Vietnamese Restaurant,Video Store,2.0


In [200]:
Cluster3['1st Most Common Venue'].value_counts()

Residential Building (Apartment / Condo)    2
Pharmacy                                    2
Playground                                  1
Video Store                                 1
Name: 1st Most Common Venue, dtype: int64

This cluster is rather small and not so straight forward to describe

### Cluster 4:

In [201]:
Cluster4 = Toronto_complete2.loc[Toronto_complete2['Cluster'] == 3, Toronto_complete2.columns[[0] + list(range(5, Toronto_complete2.shape[1]))]]
Cluster4.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
0,Parkwoods,Park,Food & Drink Shop,Fast Food Restaurant,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,Vietnamese Restaurant,3.0
7,Don Mills)Nort,Pool,Park,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,Vietnamese Restaurant,Video Store,3.0
13,Don Mills)South(Flemingdon Park,Trail,River,Park,Gym,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,3.0
32,Scarborough Village,Spa,Park,Grocery Store,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,Vietnamese Restaurant,3.0
35,The Danforth East,Park,Greek Restaurant,Convenience Store,Yoga Studio,Women's Store,Wings Joint,Wine Bar,Whisky Bar,Warehouse Store,Vietnamese Restaurant,3.0


In [202]:
Cluster4['1st Most Common Venue'].value_counts()

Park                  5
Trail                 3
Pool                  3
Playground            2
Photography Studio    1
Spa                   1
Name: 1st Most Common Venue, dtype: int64

This neighborhood has mostly parks and trails