# Coursera Final Project
## Battle of the Neighborhoods
### Table of Contents
<a id='table-of-contents'></a>
- [1 Introduction](#1)
- [2 Data Collection](#2)
- [3 Hotels](#3)
- [4 Analyzing hotel Amenities](#4)
- [5 Clustering](#5)

[back to top](#table-of-contents)
<a id='1'></a>
## 1. Introduction
### Business Problem
When people travel to different cities they often want to be located near places they either want to visit, or will find useful (depending on reason for travel). A travel agent might be able to suggest hotels to stay that fit these criteria, but as we have seen with the rise of websites like Booking.com having a website to visit that can tell you details about the hotels and what they are near is useful. The business problem I propose is how to efficienly and effectively find somewhere to stay in a city you have never been to.
People who would be interested in this problem: Frequent travellers, travel agents, websites like Booking.com. 

### The Data
The data will come from two places. Firstly I will use the data from the postal codes of Canada wikipedia article to initially gather the data on neighborhoods in Toronto, before using that data to search FoursquareAPI for hotels in those neighborhoods. I will then again use the FOursquareAPI to find all of the places of interest within a certain distance of those hotels and rank them closest to farthest. After that the hotels will be clustered. I hope this will provide a useful lisst of hotels, clustered by amenities(places of interest), and also ranked by how close they are to those amenities. 

[bak to top](#table-of-contents)
<a id='2'></a>
## 2. Data Collection
### 2.1 Loading in Packages

In [63]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests

In [64]:
wiki_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data = requests.get(wiki_url).text
soup = BeautifulSoup(data, 'html5lib')

In [158]:
table = soup.find('table')
#print(table.prettify())

- Create a list that will house the data we scrape
- go through each cell in the table on the wiki article and scrape from it the Post Code, Neighborhood and Borough.
    - When scraping the borough and neighborhood we need to split on the brackets as the name not in brackets is the Borough and the name(s) in brackets are the Neighborhood(s). 

In [66]:
table_contents=[]
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

In [67]:
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto Business,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


### 2.2 Latitude and Longitude
Since I am unable to use the geocoder module I am instead using the .csv provided by coursera for the Toronto Neighborhoods

In [69]:
coords = pd.read_csv('C:/Users/swamp/Documents/Geospatial_Coordinates.csv')
df = pd.merge(df, coords, on='PostalCode')
df.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


[back to top](#table-of-contents)
<a id = '3'></a>
## 3. Obtaining Hotels
### 3.1 Exploring Neighborhoods adjusted
This is similar to the exploring neighborhoods used in the last project, instead this time we will crete a dataframe of the Hotels and their details. 
### 3.1.1 Packages

In [70]:
import json
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

In [71]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="TR_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geograpical coordinate of Toronto are {latitude}, {longitude}.')

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


### Define Foursquare Credentials and version

In [73]:
CLIENT_ID = 'JSPIGTXNCYYYVMGTZAEMV0OAXVEU544QSOOODC02WV2TJXJA' # your Foursquare ID
CLIENT_SECRET = 'WXHUIRGYZ0CBDXZN2XFPXXGHSYLAVJUZ3G01RM5EP4WLWLP3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JSPIGTXNCYYYVMGTZAEMV0OAXVEU544QSOOODC02WV2TJXJA
CLIENT_SECRET:WXHUIRGYZ0CBDXZN2XFPXXGHSYLAVJUZ3G01RM5EP4WLWLP3


Now we will explore the neighborhoods to obtain all the hotels in the area

In [86]:
def getNearbyHotels(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results if v['venue']['categories'][0]['name'] =='Hotel'])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Hotel', 
                  'Hotel Latitude', 
                  'Hotel Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Toronto_hotels = getNearbyHotels(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
The Danforth  East
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East
The Danforth

In [97]:
print(Toronto_hotels.shape)
Toronto_hotels.head()

(43, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Hotel,Hotel Latitude,Hotel Longitude,Venue Category
0,"Garden District, Ryerson",43.657162,-79.378937,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Hotel
1,"Garden District, Ryerson",43.657162,-79.378937,Marriott Downtown at CF Toronto Eaton Centre,43.654728,-79.382422,Hotel
2,St. James Town,43.651494,-79.375418,The Omni King Edward Hotel,43.649191,-79.376006,Hotel
3,St. James Town,43.651494,-79.375418,One King West Hotel & Residence,43.649139,-79.377876,Hotel
4,Berczy Park,43.644771,-79.373306,"The Westin Harbour Castle, Toronto",43.641211,-79.375749,Hotel


In [147]:
drops = ['Neighborhood Latitude' ,'Neighborhood Longitude', 'Venue Category']
T_data = Toronto_hotels.drop(drops, axis = 1)

In [96]:
n_hotels = Toronto_hotels['Hotel'].nunique()
print(n_hotels)

22


Eeven though the dataframe shows `43` hotels, there are only `22` unique ones.

Now we have the dataframe of hotels, we can plot them on a map to see their distribution. After that we can perform a similar function to what we did above for venues near each hotel

In [88]:
# creating map 
map_thotels = folium.Map(location = [latitude, longitude], zoom_start = 11)

# adding markers
for lat, lng, Neighborhood, Hotel in zip(Toronto_hotels['Hotel Latitude'], Toronto_hotels['Hotel Longitude'], Toronto_hotels['Neighborhood'], Toronto_hotels['Hotel']):
    label = f'{Hotel},{Neighborhood}'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat, lng],
    radius = 5, 
    popup=label, 
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity = 0.7,
    parse_html=False).add_to(map_thotels)
    
map_thotels

Now we go back and rerun the functino used to get hotels, but only to grab venues and refer them to the hotels they are near to. Before we do that however we need to drop all duplicate hotels

In [106]:
Toronto_hotels.drop_duplicates(['Hotel'], keep='first', inplace =True, ignore_index =True)
Toronto_hotels

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Hotel,Hotel Latitude,Hotel Longitude,Venue Category
0,"Garden District, Ryerson",43.657162,-79.378937,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Hotel
1,"Garden District, Ryerson",43.657162,-79.378937,Marriott Downtown at CF Toronto Eaton Centre,43.654728,-79.382422,Hotel
2,St. James Town,43.651494,-79.375418,The Omni King Edward Hotel,43.649191,-79.376006,Hotel
3,St. James Town,43.651494,-79.375418,One King West Hotel & Residence,43.649139,-79.377876,Hotel
4,Berczy Park,43.644771,-79.373306,"The Westin Harbour Castle, Toronto",43.641211,-79.375749,Hotel
5,"Richmond, Adelaide, King",43.650571,-79.384568,Shangri-La Toronto,43.649129,-79.386557,Hotel
6,"Richmond, Adelaide, King",43.650571,-79.384568,Hilton,43.649946,-79.385479,Hotel
7,"Richmond, Adelaide, King",43.650571,-79.384568,The Adelaide Hotel Toronto,43.649831,-79.380164,Hotel
8,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,Delta Hotels by Marriott Toronto,43.642882,-79.383949,Hotel
9,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,Le Germain Hotel,43.643125,-79.380918,Hotel


In [130]:
def getNearbyAmenities(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Hotel', 
                  'Hotel Latitude', 
                  'Hotel Longitude', 
                  'Amenity', 
                  'Amenity Latitude', 
                  'Amenity Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

hotel_amenities = getNearbyAmenities(names=Toronto_hotels['Hotel'],
                                   latitudes=Toronto_hotels['Hotel Latitude'],
                                   longitudes=Toronto_hotels['Hotel Longitude']
                                  )
print(hotel_amenities.shape)
hotel_amenities.head()

The Grand Hotel & Suites Toronto
Marriott Downtown at CF Toronto Eaton Centre
The Omni King Edward Hotel
One King West Hotel & Residence
The Westin Harbour Castle, Toronto
Shangri-La Toronto
Hilton
The Adelaide Hotel Toronto
Delta Hotels by Marriott Toronto
Le Germain Hotel
Radisson Admiral Hotel Toronto-Harbourfront
The Fairmont Royal York
The Ritz-Carlton
Cosmopolitan Toronto Centre Hotel & Spa
Gecko Hospitality
Cambridge Suites Toronto
Novotel Toronto North York
Best Western Roehampton Hotel & Suites
Hilton Garden Inn
Courtyard Mississauga-Airport Corporate Centre West
The Anndore House
Town Inn Suites
(1578, 7)


Unnamed: 0,Hotel,Hotel Latitude,Hotel Longitude,Amenity,Amenity Latitude,Amenity Longitude,Venue Category
0,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Page One Cafe,43.657772,-79.376073,Café
1,The Grand Hotel & Suites Toronto,43.656449,-79.37411,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Hotel
2,The Grand Hotel & Suites Toronto,43.656449,-79.37411,GEORGE Restaurant,43.653346,-79.374445,Restaurant
3,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Hokkaido Ramen Santouka らーめん山頭火,43.656435,-79.377586,Ramen Restaurant
4,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Burrito Boyz,43.656265,-79.378343,Burrito Place


In [157]:
for venue in hotel_amenities['Venue Category'].unique():
    print(venue)

Café
Restaurant
Ramen Restaurant
Burrito Place
German Restaurant
Middle Eastern Restaurant
Coffee Shop
Diner
Pizza Place
Theater
Music Venue
Camera Store
Print Shop
College Rec Center
Pakistani Restaurant
Falafel Restaurant
Grocery Store
Bus Stop
Hotel
Clothing Store
Cosmetics Shop
Bookstore
Shopping Mall
Plaza
Bubble Tea Shop
Neighborhood
Fast Food Restaurant
Breakfast Spot
Miscellaneous Shop
Comic Shop
Electronics Store
Poke Place
Tanning Salon
Sushi Restaurant
New American Restaurant
Tea Room
Furniture / Home Store
Gastropub
Movie Theater
Burger Joint
Modern European Restaurant
Dessert Shop
Gym
Monument / Landmark
Office
Mexican Restaurant
American Restaurant
Bank
Toy / Game Store
Department Store
Art Museum
Spa
Japanese Restaurant
Italian Restaurant
Seafood Restaurant
Smoothie Shop
Latin American Restaurant
Art Gallery
Shoe Store
Food Court
Video Game Store
Vietnamese Restaurant
Vegetarian / Vegan Restaurant
Ethiopian Restaurant
Pub
Cocktail Bar
Sandwich Place
Park
Thai Restaurant


In [132]:
for i, row in enumerate(hotel_amenities['Hotel']):
    if str(row) == hotel_amenities['Amenity'][i]:
        hotel_amenities.drop(i, inplace = True)
    
print(hotel_amenities.shape)
hotel_amenities.head()

(1557, 7)


Unnamed: 0,Hotel,Hotel Latitude,Hotel Longitude,Amenity,Amenity Latitude,Amenity Longitude,Venue Category
0,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Page One Cafe,43.657772,-79.376073,Café
2,The Grand Hotel & Suites Toronto,43.656449,-79.37411,GEORGE Restaurant,43.653346,-79.374445,Restaurant
3,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Hokkaido Ramen Santouka らーめん山頭火,43.656435,-79.377586,Ramen Restaurant
4,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Burrito Boyz,43.656265,-79.378343,Burrito Place
5,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Schnitzel Queen,43.654239,-79.370533,German Restaurant


In [136]:
hotel_amenities.reset_index().drop('index', axis = 1, inplace =True)
hotel_amenities

Unnamed: 0,Hotel,Hotel Latitude,Hotel Longitude,Amenity,Amenity Latitude,Amenity Longitude,Venue Category
0,The Grand Hotel & Suites Toronto,43.656449,-79.374110,Page One Cafe,43.657772,-79.376073,Café
2,The Grand Hotel & Suites Toronto,43.656449,-79.374110,GEORGE Restaurant,43.653346,-79.374445,Restaurant
3,The Grand Hotel & Suites Toronto,43.656449,-79.374110,Hokkaido Ramen Santouka らーめん山頭火,43.656435,-79.377586,Ramen Restaurant
4,The Grand Hotel & Suites Toronto,43.656449,-79.374110,Burrito Boyz,43.656265,-79.378343,Burrito Place
5,The Grand Hotel & Suites Toronto,43.656449,-79.374110,Schnitzel Queen,43.654239,-79.370533,German Restaurant
...,...,...,...,...,...,...,...
1572,Town Inn Suites,43.669056,-79.382573,Striker Sports Bar,43.665840,-79.386895,Sports Bar
1574,Town Inn Suites,43.669056,-79.382573,Rabba Fine Foods,43.672101,-79.384960,Grocery Store
1575,Town Inn Suites,43.669056,-79.382573,Croissant Tree,43.669575,-79.382331,Coffee Shop
1576,Town Inn Suites,43.669056,-79.382573,Subway,43.671755,-79.384877,Sandwich Place


[back to top](#table-of-contents)
<a id='4'></a>
## 4. Analyzing Hotel Amenities
Now we have all the data we need we can start processing it and analysing it. 
### 4.1 Creating categories

In [137]:
#one-hot encoding
hotel_onehot = pd.get_dummies(hotel_amenities[['Venue Category']], prefix="", prefix_sep="")

#adding hotel to dataframe
hotel_onehot['Hotel'] = hotel_amenities['Hotel']

#move hotel to be first column
fixed_columns = [hotel_onehot.columns[-1]] + list(hotel_onehot.columns[:-1])
hotel_onehot = hotel_onehot[fixed_columns]

hotel_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,...,Toy / Game Store,Track,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [138]:
hotel_onehot.shape

(1557, 186)

### 4.2 Frequency of Amenity
Now we group the rows by hotel and take the mean of the frequency of the occurence of each category

In [140]:
hotels_grouped = hotel_onehot.groupby('Hotel').mean().reset_index()
hotels_grouped

Unnamed: 0,Hotel,Yoga Studio,Adult Boutique,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,...,Toy / Game Store,Track,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store
0,Best Western Roehampton Hotel & Suites,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.025641,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0
1,Cambridge Suites Toronto,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.02,0.01,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
2,Cosmopolitan Toronto Centre Hotel & Spa,0.010101,0.0,0.030303,0.0,0.020202,0.0,0.0,0.010101,0.010101,...,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.010101,0.0,0.0
3,Courtyard Mississauga-Airport Corporate Centre...,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Delta Hotels by Marriott Toronto,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Gecko Hospitality,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Hilton,0.0,0.0,0.010101,0.0,0.010101,0.0,0.010101,0.010101,0.0,...,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0
7,Hilton Garden Inn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Le Germain Hotel,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.011628,0.011628,0.0,0.0,0.0,0.0,0.0,0.0
9,Marriott Downtown at CF Toronto Eaton Centre,0.0,0.0,0.010638,0.0,0.010638,0.010638,0.0,0.0,0.0,...,0.010638,0.0,0.0,0.010638,0.010638,0.010638,0.0,0.0,0.0,0.0


### 4.3 Most common venues for each hotel
Now we print each Hotel, along with the top 5 most common venues. After, we put it into a *pandas* dataframe in descending order.

In [141]:
num_top_venues = 5

for hotel in hotels_grouped['Hotel']:
    print("----"+hotel+"----")
    temp = hotels_grouped[hotels_grouped['Hotel'] == hotel].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Best Western Roehampton Hotel & Suites----
              venue  freq
0       Pizza Place  0.10
1       Coffee Shop  0.10
2               Gym  0.05
3  Sushi Restaurant  0.05
4      Dessert Shop  0.05


----Cambridge Suites Toronto----
            venue  freq
0  Clothing Store  0.06
1     Coffee Shop  0.05
2             Gym  0.04
3  Cosmetics Shop  0.04
4      Restaurant  0.04


----Cosmopolitan Toronto Centre Hotel & Spa----
                venue  freq
0         Coffee Shop  0.08
1  Seafood Restaurant  0.06
2          Restaurant  0.06
3                Café  0.05
4           Gastropub  0.04


----Courtyard Mississauga-Airport Corporate Centre West----
                  venue  freq
0   American Restaurant   0.2
1           Coffee Shop   0.2
2                   Gym   0.2
3           Yoga Studio   0.0
4  Pakistani Restaurant   0.0


----Delta Hotels by Marriott Toronto----
              venue  freq
0       Coffee Shop  0.08
1              Café  0.05
2    Scenic Lookout  0.03
3          

In [142]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Hotel']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Hotels_venues_sorted = pd.DataFrame(columns=columns)
Hotels_venues_sorted['Hotel'] = hotels_grouped['Hotel']

for ind in np.arange(hotels_grouped.shape[0]):
    Hotels_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hotels_grouped.iloc[ind, :], num_top_venues)

Hotels_venues_sorted.head()

Unnamed: 0,Hotel,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Best Western Roehampton Hotel & Suites,Pizza Place,Coffee Shop,Gym,Sushi Restaurant,Dessert Shop,Yoga Studio,Café,Seafood Restaurant,Sandwich Place,Restaurant
1,Cambridge Suites Toronto,Clothing Store,Coffee Shop,Gym,Cosmetics Shop,Restaurant,Bakery,Italian Restaurant,Café,Seafood Restaurant,Gastropub
2,Cosmopolitan Toronto Centre Hotel & Spa,Coffee Shop,Seafood Restaurant,Restaurant,Café,Gastropub,Japanese Restaurant,Italian Restaurant,Gym,American Restaurant,Cocktail Bar
3,Courtyard Mississauga-Airport Corporate Centre...,American Restaurant,Coffee Shop,Gym,Yoga Studio,Pakistani Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant
4,Delta Hotels by Marriott Toronto,Coffee Shop,Café,Scenic Lookout,Plaza,Baseball Stadium,Sandwich Place,Supermarket,Sports Bar,Concert Hall,Convenience Store


[back to top](#table-of-contents)
<a id='5'></a>
## 5. Clustering
Now we are going to cluster our hotels into similar categories. This has multiple implications. It could be used for fast sorting, and also to work out which 'clusters' of hotels are least common, in case you wanted to build a hotel that offered close distance to some amenities that are hard to get to. 
### 5.1 k-means clustering
now we run k-means clustering on our data.

In [144]:
# set numer of clusters
kclusters = 5

hotel_clusters = hotels_grouped.drop('Hotel', 1)

#runs k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=42).fit(hotel_clusters)

#check cluster labels for each row in dataframe
kmeans.labels_[0:10]

array([1, 2, 2, 4, 1, 0, 1, 3, 1, 1])

### 5.2 investigating clusters

In [149]:
# add clustering labels
Hotels_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

hotels_merged = T_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
hotels_merged = hotels_merged.join(Hotels_venues_sorted.set_index('Hotel'), on='Hotel')

hotels_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Hotel,Hotel Latitude,Hotel Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Garden District, Ryerson",The Grand Hotel & Suites Toronto,43.656449,-79.37411,1,Restaurant,Coffee Shop,Pizza Place,Theater,Music Venue,College Rec Center,Café,Camera Store,Pakistani Restaurant,Falafel Restaurant
1,"Garden District, Ryerson",Marriott Downtown at CF Toronto Eaton Centre,43.654728,-79.382422,1,Coffee Shop,Clothing Store,Diner,Breakfast Spot,Cosmetics Shop,Sandwich Place,Tanning Salon,Bubble Tea Shop,Office,Theater
2,St. James Town,The Omni King Edward Hotel,43.649191,-79.376006,2,Coffee Shop,Café,Restaurant,Seafood Restaurant,Bakery,Italian Restaurant,Japanese Restaurant,Cocktail Bar,Gastropub,Gym
3,St. James Town,One King West Hotel & Residence,43.649139,-79.377876,2,Coffee Shop,Restaurant,Italian Restaurant,Seafood Restaurant,Café,Gym,Gastropub,Japanese Restaurant,American Restaurant,Cocktail Bar
4,Berczy Park,"The Westin Harbour Castle, Toronto",43.641211,-79.375749,1,Boat or Ferry,Coffee Shop,Sporting Goods Shop,Supermarket,Restaurant,Fried Chicken Joint,Bar,Park,Ramen Restaurant,Juice Bar


Finally, let's create a map

In [150]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hotels_merged['Hotel Latitude'], hotels_merged['Hotel Longitude'], hotels_merged['Hotel'], hotels_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 5.3 Examine Clusters
We will now examine the clusters, and which hotels are placed where. It will be interesting as we have 1 very populated clusters and 3 singlely-populated clusters

**Cluster 1**

In [151]:
hotels_merged.loc[hotels_merged['Cluster Labels'] ==0, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Gecko Hospitality,Park,Yoga Studio,Opera House,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub


**Cluster 2**

In [152]:
hotels_merged.loc[hotels_merged['Cluster Labels'] ==1, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,The Grand Hotel & Suites Toronto,Restaurant,Coffee Shop,Pizza Place,Theater,Music Venue,College Rec Center,Café,Camera Store,Pakistani Restaurant,Falafel Restaurant
1,Marriott Downtown at CF Toronto Eaton Centre,Coffee Shop,Clothing Store,Diner,Breakfast Spot,Cosmetics Shop,Sandwich Place,Tanning Salon,Bubble Tea Shop,Office,Theater
4,"The Westin Harbour Castle, Toronto",Boat or Ferry,Coffee Shop,Sporting Goods Shop,Supermarket,Restaurant,Fried Chicken Joint,Bar,Park,Ramen Restaurant,Juice Bar
5,Shangri-La Toronto,Coffee Shop,Bar,Café,Taco Place,Restaurant,Fast Food Restaurant,Seafood Restaurant,Mediterranean Restaurant,Pizza Place,Japanese Restaurant
6,Hilton,Coffee Shop,Café,Gym,Restaurant,Bar,Thai Restaurant,Lounge,Sushi Restaurant,Pizza Place,Burrito Place
8,Delta Hotels by Marriott Toronto,Coffee Shop,Café,Scenic Lookout,Plaza,Baseball Stadium,Sandwich Place,Supermarket,Sports Bar,Concert Hall,Convenience Store
9,Le Germain Hotel,Coffee Shop,Café,Restaurant,Japanese Restaurant,Sandwich Place,Boat or Ferry,Pub,Park,Supermarket,Steakhouse
10,Radisson Admiral Hotel Toronto-Harbourfront,Coffee Shop,Aquarium,Scenic Lookout,Gym,Pizza Place,Brewery,Restaurant,Bar,Boat or Ferry,Baseball Stadium
12,The Ritz-Carlton,Bar,Pizza Place,Café,Coffee Shop,Movie Theater,Theater,Fast Food Restaurant,Italian Restaurant,Restaurant,Roof Deck
16,Novotel Toronto North York,Pizza Place,Coffee Shop,Ramen Restaurant,Bank,Café,Korean Restaurant,Shopping Mall,Fast Food Restaurant,Restaurant,Bubble Tea Shop


**Cluster 3**

In [153]:
hotels_merged.loc[hotels_merged['Cluster Labels'] ==2, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,The Omni King Edward Hotel,Coffee Shop,Café,Restaurant,Seafood Restaurant,Bakery,Italian Restaurant,Japanese Restaurant,Cocktail Bar,Gastropub,Gym
3,One King West Hotel & Residence,Coffee Shop,Restaurant,Italian Restaurant,Seafood Restaurant,Café,Gym,Gastropub,Japanese Restaurant,American Restaurant,Cocktail Bar
7,The Adelaide Hotel Toronto,Coffee Shop,Café,Gym,Japanese Restaurant,Italian Restaurant,Restaurant,Seafood Restaurant,American Restaurant,Gastropub,Cosmetics Shop
11,The Fairmont Royal York,Coffee Shop,Café,Steakhouse,Bakery,Sporting Goods Shop,Concert Hall,Sandwich Place,Salad Place,Restaurant,Pub
13,Cosmopolitan Toronto Centre Hotel & Spa,Coffee Shop,Seafood Restaurant,Restaurant,Café,Gastropub,Japanese Restaurant,Italian Restaurant,Gym,American Restaurant,Cocktail Bar
15,Cambridge Suites Toronto,Clothing Store,Coffee Shop,Gym,Cosmetics Shop,Restaurant,Bakery,Italian Restaurant,Café,Seafood Restaurant,Gastropub


**Cluster 4**

In [154]:
hotels_merged.loc[hotels_merged['Cluster Labels'] ==3, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Hilton Garden Inn,Coffee Shop,Fried Chicken Joint,Indian Restaurant,Comedy Club,Burrito Place,Burger Joint,Nightclub,Breakfast Spot,Sandwich Place,Middle Eastern Restaurant


**Cluster 5**

In [155]:
hotels_merged.loc[hotels_merged['Cluster Labels'] ==4, hotels_merged.columns[[1] + list(range(5, hotels_merged.shape[1]))]]

Unnamed: 0,Hotel,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Courtyard Mississauga-Airport Corporate Centre...,American Restaurant,Coffee Shop,Gym,Yoga Studio,Pakistani Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant
