# Journey Planner - Capstone Project

In this notebook, I will create a Journey Planner for a hypothetical trip to Toronto. 

## Problem Statement 
Every time I want to visit a new city, I want to make sure that I can visit the most representative places (POI - Point of Interests). Moreover, I want to find a strategic place for my hotel (starting point) and grouping the visits per each day in the most efficient manner. In my hypothetical trip to Toronto, I have 10 days of travel to schedule, to make the most out of it.
Therefore, with this project I want to: 
1. Find a method to cluster my POI, so that I can visit each day, places that are in the same area. 
2. Find a method to search for the best located hotel  

## Data and Methodology 
1. Collection of POI from the following website: https://theculturetrip.com - **Web scraping using BeautifulSoup**
2. Finding latitude and longitude of the POI found above - **Foursquare API call (Regular calls)**
3. Clustering the POI - **K-means Clustering** 
4. Clustering the city centre POI - **K-means Clustering**
5. Find the hotels around city centre - **Foursquare API call (Regular calls)**
6. Find Details around the hotels - **Foursquare API call (Premium calls)**
7. Find the best located hotel - **Google Maps, Directions API**
8. Show photos of the top 5 hotels - using data collected in point 6


## Libraries 
Set up the environment with the libraries that we will use 

In [1]:
import pandas as pd 
import requests
import numpy as np 
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import warnings
warnings.filterwarnings('ignore')

## 1. Collection of POI 
We are going to pick a website that collects the most iconic place of Toronto that they recommend to visit. 
Let's use BeautifulSoup for this task. I will analyse the `html` of the website using the "inspect element" tools and localise the titles of the places. Website scraped: https://theculturetrip.com/north-america/canada/articles/20-must-visit-attractions-in-toronto/

In [2]:
## Let's scrape the venues 

website_url = requests.get('https://theculturetrip.com/north-america/canada/articles/20-must-visit-attractions-in-toronto/').text

from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')

table = soup.find('section',{'class':'article-main-content-jsonstyled__ArticleMainContentJson-uefb7f-0 iSiOAF'})
content = table.find_all('article')


In [5]:
## Let's scrape the titles and collect them into the array "venues"

venues = []
for row in content: 
    element = row.find('h2').text
    venues.append(element)

In [6]:
## I need to change the name of the first title from 
## 'CN Tower and Edgewalk' to 'CN Tower and Edge walk' 
## because Foursuare will not recognise 'CN Tower and Edgewalk'
## without the space 

venues[0] = 'CN Tower and Edge walk'
venues

['CN Tower and Edge walk',
 'Ripley’s Aquarium of Canada',
 'Hockey Hall of Fame',
 'Casa Loma',
 'Royal Ontario Museum (ROM)',
 'Toronto Islands & Centreville',
 'St. Lawrence Market',
 'Toronto Eaton Centre',
 'Toronto Zoo',
 'Art Gallery of Ontario (AGO)',
 'High Park',
 'Ontario Science Centre',
 'Scarborough Bluffs',
 'Black Creek Pioneer Village',
 'Fort York National Historic Site',
 'Allen Gardens Conservatory',
 'Canada’s Wonderland']

## 2. Finding latitude and longitude of the POI 
We need to find the coordinates of the POI found in the previous step. These will help us cluster them into groups in the following step (#3). <br>
Use the **Foursquare API (Regular calls)** in order to get this information. 

In [213]:
## Infomration that we will use for API calls

CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
near = 'Toronto'

In [8]:
## Prepare a new dataframe to contain the results 

columns = ["venue", "id", "longitude", "latitude"]
venues_df = pd.DataFrame(columns = columns)

In [9]:
## Collect the information while using API calls 
## location ID, longitude and latitude  

for venue in venues: 
    search_query = venue
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&near={}&query={}'.format(CLIENT_ID, 
                                                                                                                   CLIENT_SECRET, 
                                                                                                                   VERSION, 
                                                                                                                   near, 
                                                                                                                   search_query)
    results = requests.get(url).json()
    venue_id = results['response']['venues'][0]["id"]
    latitude = results['response']['venues'][0]["location"]["lat"]
    longitude = results['response']['venues'][0]["location"]["lng"]
    venues_df = venues_df.append({"venue": venue, 
                     "id": venue_id, 
                     "latitude": latitude, 
                     "longitude": longitude}, ignore_index = True)

In [10]:
venues_df

Unnamed: 0,venue,id,longitude,latitude
0,CN Tower and Edge walk,53c96be1498e19b82b550114,-79.387143,43.642465
1,Ripley’s Aquarium of Canada,4f05da6f0e61b14c28b2e05f,-79.386252,43.642104
2,Hockey Hall of Fame,4ad4c05ef964a520d8f620e3,-79.377323,43.646974
3,Casa Loma,4ced82135de16ea89e89b696,-79.410732,43.676314
4,Royal Ontario Museum (ROM),4e3bef1b3151eaa7c43a52e5,-79.393801,43.668226
5,Toronto Islands & Centreville,4ad4c05ef964a5209af620e3,-79.378495,43.622112
6,St. Lawrence Market,4ad4c062f964a520fbf720e3,-79.371597,43.648743
7,Toronto Eaton Centre,4ad77a12f964a520260b21e3,-79.380633,43.654877
8,Toronto Zoo,4ad4c05ef964a52093f620e3,-79.181551,43.820582
9,Art Gallery of Ontario (AGO),4ad4c05ef964a520daf620e3,-79.392922,43.654003


## 3. Clustering the POI
We have 17 locations that we want to visit - how to cluster them? I decided to use their coordinates so that closer locations could stay in the same cluster - Technique used: Partitional Clustering - **K-means Clustering**. 
<br>
<br>
The initial number for cluster is 7 because I am expecting that most of the attractions are going to be in one group in the city centre and consequently I will need to divide them further more, afterwards.  

In [70]:
# set number of clusters
kclusters = 7

# for clustering I want to use only the coordinates so prepare the df for this 
venues_for_clustering = venues_df.drop(["venue", "id"], 1)

# run k-means clustering
kmeans1 = KMeans(n_clusters=kclusters, random_state=0).fit(venues_for_clustering)

# print the lables of the clusters 
kmeans1.labels_[0:20] 

array([1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 2, 3, 0, 6, 1, 1, 4], dtype=int32)

In [71]:
## Print the Centroids of each cluster 

clusters_toronto = kmeans1.cluster_centers_[0:7]
clusters_toronto

array([[  0.        , -79.23723936,  43.70777964],
       [  4.        , -79.38729204,  43.65046351],
       [  2.        , -79.46342468,  43.64647917],
       [  6.        , -79.34018861,  43.71609345],
       [  5.        , -79.542856  ,  43.84162   ],
       [  3.        , -79.18155126,  43.8205819 ],
       [  1.        , -79.5169914 ,  43.77339343]])

In [75]:
# Add this information into the data frame 
venues_df.insert(0, 'Cluster Labels 1', kmeans1.labels_)

In [77]:
venues_df

Unnamed: 0,Cluster Labels 1,venue,id,longitude,latitude
0,1,CN Tower and Edge walk,53c96be1498e19b82b550114,-79.387143,43.642465
1,1,Ripley’s Aquarium of Canada,4f05da6f0e61b14c28b2e05f,-79.386252,43.642104
2,1,Hockey Hall of Fame,4ad4c05ef964a520d8f620e3,-79.377323,43.646974
3,1,Casa Loma,4ced82135de16ea89e89b696,-79.410732,43.676314
4,1,Royal Ontario Museum (ROM),4e3bef1b3151eaa7c43a52e5,-79.393801,43.668226
5,1,Toronto Islands & Centreville,4ad4c05ef964a5209af620e3,-79.378495,43.622112
6,1,St. Lawrence Market,4ad4c062f964a520fbf720e3,-79.371597,43.648743
7,1,Toronto Eaton Centre,4ad77a12f964a520260b21e3,-79.380633,43.654877
8,5,Toronto Zoo,4ad4c05ef964a52093f620e3,-79.181551,43.820582
9,1,Art Gallery of Ontario (AGO),4ad4c05ef964a520daf620e3,-79.392922,43.654003


In [76]:
## Toronto 
toronto_latitude = 43.658000
toronto_longitude = -79.375000

# create map
map_clusters = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(venues_df['latitude'], venues_df['longitude'], venues_df['venue'], venues_df['Cluster Labels 1']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

       
map_clusters

<img src="3.Clustering.png">

### Clustering Results 

As anticipated, most of the clusters are positioned far away from the city centre. 
- Day 1: Black Creek Pioneer Village (Orange dot)
- Day 2: Canada's Wonderland (Green Dot)
- Day 3: High Park (Blue dot)
- Day 4: Scarborough Bluffs (Red Dot)
- Day 5: Toronto Zoo (yellow dot)
- Day 6: Ontario Science Centre (light blue)
- the rest: City Centre (purple dots)

These also are fortunately destinations where spending the whole day is considered appropriate. 
"Day 6" can be further down divided in more clusters since a lot of destinations were clustered together. 

From day 1 to day 5 we can rent a car and the remaining 4 days in city centre we can use the public transportation. 

## 4. Clustering the city centre POI
The city centre locations (purple points) are numerous and difficult to visit all in one day... Let's cluster them in the remaining 4 clusters (remaining 4 days). 



In [84]:
## The centroid of the Cluster 1, city center is : 

citycentre_centroid_lat =clusters_toronto[1][2]
citycentre_centroid_lng =clusters_toronto[1][1]
print ("The centroid of the city centre cluster is: lat:{}, lng: {}".format(citycentre_centroid_lat, 
                                                                            citycentre_centroid_lng))

The centroid of the city centre cluster is: lat:43.65046351331826, lng: -79.38729204111681


In [80]:
## Filter the dataframe so we have only the city centre 

venues_city_centre = venues_df.loc[venues_df["Cluster Labels 1"] == 1]
venues_city_centre

Unnamed: 0,Cluster Labels 1,venue,id,longitude,latitude
0,1,CN Tower and Edge walk,53c96be1498e19b82b550114,-79.387143,43.642465
1,1,Ripley’s Aquarium of Canada,4f05da6f0e61b14c28b2e05f,-79.386252,43.642104
2,1,Hockey Hall of Fame,4ad4c05ef964a520d8f620e3,-79.377323,43.646974
3,1,Casa Loma,4ced82135de16ea89e89b696,-79.410732,43.676314
4,1,Royal Ontario Museum (ROM),4e3bef1b3151eaa7c43a52e5,-79.393801,43.668226
5,1,Toronto Islands & Centreville,4ad4c05ef964a5209af620e3,-79.378495,43.622112
6,1,St. Lawrence Market,4ad4c062f964a520fbf720e3,-79.371597,43.648743
7,1,Toronto Eaton Centre,4ad77a12f964a520260b21e3,-79.380633,43.654877
9,1,Art Gallery of Ontario (AGO),4ad4c05ef964a520daf620e3,-79.392922,43.654003
14,1,Fort York National Historic Site,5660732c498eee768e7672c8,-79.406626,43.637364


In [216]:
## Clean the dataframe for the clustering- leave only the coordinates

venues_citycentre_for_clustering = venues_city_centre.drop(["venue", "id", "Cluster Labels 1"], 1)

In [217]:
# run k-means clustering
# initial cluster is 4 

kclusters = 4
kmeans2 = KMeans(n_clusters=kclusters, random_state=0).fit(venues_citycentre_for_clustering)
venues_city_centre.insert(0, 'Cluster Labels 2', kmeans2.labels_)
venues_city_centre

Unnamed: 0,Cluster Labels 2,Cluster Labels 1,venue,id,longitude,latitude
0,2,1,CN Tower and Edge walk,53c96be1498e19b82b550114,-79.387143,43.642465
1,2,1,Ripley’s Aquarium of Canada,4f05da6f0e61b14c28b2e05f,-79.386252,43.642104
2,0,1,Hockey Hall of Fame,4ad4c05ef964a520d8f620e3,-79.377323,43.646974
3,3,1,Casa Loma,4ced82135de16ea89e89b696,-79.410732,43.676314
4,3,1,Royal Ontario Museum (ROM),4e3bef1b3151eaa7c43a52e5,-79.393801,43.668226
5,1,1,Toronto Islands & Centreville,4ad4c05ef964a5209af620e3,-79.378495,43.622112
6,0,1,St. Lawrence Market,4ad4c062f964a520fbf720e3,-79.371597,43.648743
7,0,1,Toronto Eaton Centre,4ad77a12f964a520260b21e3,-79.380633,43.654877
9,2,1,Art Gallery of Ontario (AGO),4ad4c05ef964a520daf620e3,-79.392922,43.654003
14,2,1,Fort York National Historic Site,5660732c498eee768e7672c8,-79.406626,43.637364


In [220]:
## Collect the cetroids of the clsuters 

clusters_citycentre = kmeans2.cluster_centers_[0:4] 
clusters_citycentre

array([[-79.37606017,  43.65312749],
       [-79.37849522,  43.62211231],
       [-79.39323573,  43.64398408],
       [-79.40226681,  43.67227003]])

In [221]:
## Toronto 
toronto_latitude = 43.653963
toronto_longitude = -79.387207

# create map
map_clusters = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=13)

# set color scheme for the clusters


colors = ["red", "blue", "green", "orange"]

# add markers to the map
markers_colors = []
folium.Marker(
        [citycentre_centroid_lat, citycentre_centroid_lng], 
        popup = "Centroid of city centre",
        icon=folium.Icon(icon='flag', color= "darkpurple")
        #color = "red"
    ).add_to(map_clusters)
for lat, lon, poi, cluster in zip(venues_city_centre['latitude'], venues_city_centre['longitude'], venues_city_centre['venue'], venues_city_centre['Cluster Labels 2']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=colors[cluster],
        fill=True,
        fill_color=colors[cluster],
        fill_opacity=0.7).add_to(map_clusters)
for i in range(len(clusters_citycentre)): 
    folium.Marker(
        [clusters_citycentre[i][1], clusters_citycentre[i][0]], 
        #radius = 5, 
        popup = "Centroid for cluster " + str(i),
        icon=folium.Icon(icon='flag', color= colors[i])
        #color = "red"
    ).add_to(map_clusters)
    folium.CircleMarker(
        [clusters_citycentre[i][1], clusters_citycentre[i][0]],
        radius=80,
        #popup=label,
        color=colors[i],
        fill=True,
        fill_color=colors[i],
        fill_opacity=0.2).add_to(map_clusters)
       
map_clusters

<img src="4.Clustering.png">

### Clustering Results 

The purple flag is the centroid of the City centre group discovered in the previous step. 

The city centres clusters found were the following:
- Day 7 - Orange Group : Royal Ontario Museum , Casa Loma
- Day 8 - Red Group :  Allen Gardens, TOronto Eaton Centre, St. Lawrence Market, Hokey Hall of fame 
- Day 9 - Green Group: Art Gallery of Ontario, Ripley's Aquarium, CN Tower and Edge Walk 
- Day 10 - Blue Group: Toronto Islands & Centreville 	

In the centre of each group, I have highlighted the centroids with a flag icon. 


## 5. Find the hotels around city centre
Search for the 40 hotels around the city centre centroid (purple flag). <br>
This is one regular call using Foursquare API. 


In [30]:
## Prepare new data frame to collect information 

columns = ["hotel", 
           "id", 
           "address", 
           "latitude", 
           "longitude",
           "distance",
           "ratings", 
           "description", 
           "website", 
           "photo_prefix", 
           "photo_suffix", 
           "photo_width", 
           "photo_height"]
hotel_df = pd.DataFrame(columns = columns)

In [31]:
## Foursquare API call 

latitude = citycentre_centroid_lat
longitude = citycentre_centroid_lng
query = "hotel"
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&query={}&limit=40'.format(CLIENT_ID, 
                                                                                                                CLIENT_SECRET, 
                                                                                                                VERSION, 
                                                                                                                latitude,
                                                                                                                longitude,
                                                                                                                query)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c8c2604db04f50db075e033'},
 'response': {'venues': [{'id': '4b930a47f964a520c43034e3',
    'name': 'Hilton Toronto Airport Hotel & Suites',
    'location': {'address': '5875 Airport Road',
     'crossStreet': 'Highway 427',
     'lat': 43.68667684394911,
     'lng': -79.60424870252609,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.68667684394911,
       'lng': -79.60424870252609}],
     'distance': 11935,
     'postalCode': 'L4V 1N1',
     'cc': 'CA',
     'city': 'Mississauga',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['5875 Airport Road (Highway 427)',
      'Mississauga ON L4V 1N1',
      'Canada']},
    'categories': [{'id': '4bf58dd8d48988d1fa931735',
      'name': 'Hotel',
      'pluralName': 'Hotels',
      'shortName': 'Hotel',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/hotel_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1552688644',
    

In [33]:
## Fill the dataframe with the new infomration 

for i in range(len(results['response']['venues'])):
    venue = results['response']['venues'][i]["name"]
    venue_id = results['response']['venues'][i]["id"]
    formattedAddress1 = results['response']['venues'][i]["location"]["formattedAddress"][0]
    try: 
        formattedAddress2 = results['response']['venues'][i]["location"]["postalCode"]
    except: 
        pass
    distance = results['response']['venues'][i]["location"]["distance"]
    address = str(formattedAddress1)+ " " +str(formattedAddress2)
    latitude = results['response']['venues'][i]["location"]["lat"]
    longitude = results['response']['venues'][i]["location"]["lng"]
    hotel_df = hotel_df.append({"hotel": venue, 
                         "id": venue_id, 
                         "address": address,
                         "latitude": latitude, 
                         "longitude": longitude, 
                         "distance": distance}, ignore_index = True)

In [222]:
hotel_df.head()

Unnamed: 0,hotel,id,address,latitude,longitude,distance,ratings,description,website,photo_prefix,photo_suffix,photo_width,photo_height
0,Hilton Toronto Airport Hotel & Suites,4b930a47f964a520c43034e3,5875 Airport Road (Highway 427) L4V 1N1,43.686677,-79.604249,11935,6.1,,http://www3.hilton.com/en/hotels/ontario/hilto...,https://fastly.4sqi.net/img/general/,/ZpxUM1yGlYIuBcIbKAk_WTDeUA-9NdIoTPCanyNCuk8.jpg,612.0,612.0
1,Sheraton Parkway Toronto North Hotel & Suites,4add3b3df964a520b36421e3,600 Highway 7 (at Leslie St) L4B 1B2,43.845757,-79.381,13573,5.6,Ideally Located Convention And Suite Hotel Clo...,https://www.marriott.com/hotels/travel/yyzsi-s...,https://fastly.4sqi.net/img/general/,/2028923_OdF4Nh1PTkhNPxtnGjkRifWJj6GqV3BhX_O31...,612.0,612.0
2,Sheraton Toronto Airport Hotel & Conference Ce...,4add3a4ff964a520ac6421e3,801 Dixon Road (at Skyway Ave) M9W 1J5,43.686855,-79.587497,11178,6.7,Sheraton Toronto Airport Hotel & Conference Ce...,https://www.marriott.com/hotels/travel/yyzds-s...,https://fastly.4sqi.net/img/general/,/212169387_C2-Wh9iB-a9yFzkkMFUuT3RsxV7Rna5lQWh...,3888.0,2592.0
3,One King West Hotel & Residence,4af96fbbf964a520c01122e3,1 King St. W. (at Yonge St.) M5H 1A1,43.648947,-79.377966,17805,8.0,One King West Hotel & Residence is located in ...,http://www.onekingwest.com,,,,
4,Radisson Suite Hotel Toronto Airport,4be0a3f198f2a59356eec25a,640 Dixon Road M9W 1J1,43.692173,-79.576619,10234,5.8,Radisson Suite Hotel Toronto Airport lies just...,https://www.radisson.com/toronto-hotel-on-m9w1...,https://fastly.4sqi.net/img/general/,/Q0QZE3B0TXRETICBAFGYLXDK2BSX5RHBG3PMS1PL1KJU1...,540.0,540.0


## 6. Find Details of the hotels: ratings, website etc...
In order to choose my hotel, I want to know the ratings, description, the website, and an image for each hotel of the area. <br>
This is done by foursquare API (premium calls).  

In [53]:


for i in range(len(hotel_df)):
    venue_id = hotel_df["id"][i]
    print(venue_id)
    venue_details = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, 
                                                                                          CLIENT_ID, 
                                                                                          CLIENT_SECRET, 
                                                                                          VERSION)
    try: 
        results = requests.get(venue_details).json()
    except Exception as e: 
        print("not able to get the results")
        print(e)
    try: 
        rating = results['response']['venue']["rating"]
    except:
        rating = "NA"
    try:
        description = results['response']['venue']["description"]
    except: 
        description = "NA"
    try:
        website = results['response']['venue']["url"]
    except: 
        website = "NA"
    try: 
        prefix = results['response']['venue']["photos"]["groups"][1]["items"][0]["prefix"]
        suffix = results['response']['venue']["photos"]["groups"][1]["items"][0]["suffix"]
        width = results['response']['venue']["photos"]["groups"][1]["items"][0]["width"]
        height = results['response']['venue']["photos"]["groups"][1]["items"][0]["height"]
    except: 
        prefix = "NA"
        suffix = "NA"
        width = "NA"
        height = "NA"
    
    hotel_df["ratings"][i]= rating
    hotel_df["description"][i]= description 
    hotel_df["website"][i]= website
    hotel_df["photo_prefix"][i]= prefix 
    hotel_df["photo_suffix"][i]= suffix 
    hotel_df["photo_width"][i]= width 
    hotel_df["photo_height"][i]= height
        
    print(venue_id, rating, description, website, prefix)
        
    
    

4b930a47f964a520c43034e3
4b930a47f964a520c43034e3 6.1 NA http://www3.hilton.com/en/hotels/ontario/hilton-toronto-airport-hotel-and-suites-YYZHIHH/index.html https://fastly.4sqi.net/img/general/
4add3b3df964a520b36421e3
4add3b3df964a520b36421e3 5.6 Ideally Located Convention And Suite Hotel Close To Downtown Toronto. Near Shopping, Restaurants, Golf Courses And Canada's Wonderland. https://www.marriott.com/hotels/travel/yyzsi-sheraton-parkway-toronto-north-hotel-and-suites/ https://fastly.4sqi.net/img/general/
4add3a4ff964a520ac6421e3
4add3a4ff964a520ac6421e3 6.7 Sheraton Toronto Airport Hotel & Conference Centre is a contemporary hotel located moments from Toronto Pearson Airport. Enjoy a complimentary airport shuttle, free on-site parking and well-appointed accommodations. https://www.marriott.com/hotels/travel/yyzds-sheraton-toronto-airport-hotel-and-conference-centre/ https://fastly.4sqi.net/img/general/
4af96fbbf964a520c01122e3
4af96fbbf964a520c01122e3 8.0 One King West Hotel & Res

4adf7d0bf964a520127b21e3 8.2 Located in the heart of Downtown Toronto, this legendary urban oasis features a fine dining restaurant, retail shops, and afternoon tea. https://www.omnihotels.com/hotels/toronto-king-edward https://fastly.4sqi.net/img/general/
4b7efbeff964a520450e30e3
4b7efbeff964a520450e30e3 8.6 Ultra-chic downtown Toronto hotel in a glass-fronted building is a 4-minute drive from the Bathurst Street Terminal's ferries from Billy Bishop Toronto City Airport. http://www.facebook.com/thompsonhotels https://fastly.4sqi.net/img/general/
4ae61cf6f964a520caa421e3
4ae61cf6f964a520caa421e3 6.5 NA NA https://fastly.4sqi.net/img/general/
51e48697498eded9073c6c17
51e48697498eded9073c6c17 6.4 NA NA https://fastly.4sqi.net/img/general/
520e1ab1498e2591863f2ca3
520e1ab1498e2591863f2ca3 NA NA NA https://fastly.4sqi.net/img/general/
4ad4c05cf964a520b6f520e3
4ad4c05cf964a520b6f520e3 5.4 NA http://internationalplazahotel.com https://fastly.4sqi.net/img/general/
4ff8fa85e4b003534cce7955
4ff

In [223]:

hotel_df.head()

Unnamed: 0,hotel,id,address,latitude,longitude,distance,ratings,description,website,photo_prefix,photo_suffix,photo_width,photo_height
0,Hilton Toronto Airport Hotel & Suites,4b930a47f964a520c43034e3,5875 Airport Road (Highway 427) L4V 1N1,43.686677,-79.604249,11935,6.1,,http://www3.hilton.com/en/hotels/ontario/hilto...,https://fastly.4sqi.net/img/general/,/ZpxUM1yGlYIuBcIbKAk_WTDeUA-9NdIoTPCanyNCuk8.jpg,612.0,612.0
1,Sheraton Parkway Toronto North Hotel & Suites,4add3b3df964a520b36421e3,600 Highway 7 (at Leslie St) L4B 1B2,43.845757,-79.381,13573,5.6,Ideally Located Convention And Suite Hotel Clo...,https://www.marriott.com/hotels/travel/yyzsi-s...,https://fastly.4sqi.net/img/general/,/2028923_OdF4Nh1PTkhNPxtnGjkRifWJj6GqV3BhX_O31...,612.0,612.0
2,Sheraton Toronto Airport Hotel & Conference Ce...,4add3a4ff964a520ac6421e3,801 Dixon Road (at Skyway Ave) M9W 1J5,43.686855,-79.587497,11178,6.7,Sheraton Toronto Airport Hotel & Conference Ce...,https://www.marriott.com/hotels/travel/yyzds-s...,https://fastly.4sqi.net/img/general/,/212169387_C2-Wh9iB-a9yFzkkMFUuT3RsxV7Rna5lQWh...,3888.0,2592.0
3,One King West Hotel & Residence,4af96fbbf964a520c01122e3,1 King St. W. (at Yonge St.) M5H 1A1,43.648947,-79.377966,17805,8.0,One King West Hotel & Residence is located in ...,http://www.onekingwest.com,,,,
4,Radisson Suite Hotel Toronto Airport,4be0a3f198f2a59356eec25a,640 Dixon Road M9W 1J1,43.692173,-79.576619,10234,5.8,Radisson Suite Hotel Toronto Airport lies just...,https://www.radisson.com/toronto-hotel-on-m9w1...,https://fastly.4sqi.net/img/general/,/Q0QZE3B0TXRETICBAFGYLXDK2BSX5RHBG3PMS1PL1KJU1...,540.0,540.0


In [55]:
# Save the data frame into a CSV (since you have only 50 calls per day for free)

hotel_df.to_csv("hotel_dataframe.csv", encoding='utf-8')

### Visulize the hotels 

In [87]:
# create map
hotel_clusters = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=11)

# set color scheme for the clusters


colors = ["red"]

# add markers to the map
markers_colors = []
folium.Marker(
        [citycentre_centroid_lat, citycentre_centroid_lng], 
        popup = "Centroid of city centre",
        icon=folium.Icon(icon='flag', color= "darkpurple")
    ).add_to(hotel_clusters)
folium.CircleMarker(
        [citycentre_centroid_lat, citycentre_centroid_lng],
        radius=50,
        popup=label,
        color="purple",
        fill=True,
        fill_color="purple",
        fill_opacity=0.7).add_to(hotel_clusters)
for lat, lon, poi in zip(hotel_df['latitude'], hotel_df['longitude'], hotel_df['hotel']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=colors[0],
        fill=True,
        fill_color=colors[0],
        fill_opacity=0.7).add_to(hotel_clusters)


       
hotel_clusters

<img src="6.hotels.png">

## Filtering Hotels 

I want to filter my hotels: 
1. all the hotels with **no distance or rating** information will not be used. 
2. I want a hotel **close to the city centre** : 18 kilometers max from the centroid of city centre (purple flag)
3. The minimum rating needs to be **at least 8**

In [93]:
hotel_df_clean = hotel_df.loc[(hotel_df["distance"]!= "NA") & (hotel_df["ratings"]!= "NA")]
hotel_df_clean[["distance", "ratings"]] = hotel_df_clean[["distance", "ratings"]].apply(pd.to_numeric)
hotel_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 35 entries, 0 to 38
Data columns (total 13 columns):
hotel           35 non-null object
id              35 non-null object
address         35 non-null object
latitude        35 non-null float64
longitude       35 non-null float64
distance        35 non-null int64
ratings         35 non-null float64
description     35 non-null object
website         35 non-null object
photo_prefix    35 non-null object
photo_suffix    35 non-null object
photo_width     35 non-null object
photo_height    35 non-null object
dtypes: float64(3), int64(1), object(9)
memory usage: 3.8+ KB


In [111]:
range_meters = 18000.00
rating_minimum = 8.00


In [112]:
close_hotels = hotel_df_clean.loc[(hotel_df_clean["distance"]<range_meters) & (hotel_df_clean["ratings"]>rating_minimum)]
close_hotels.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 12 entries, 7 to 34
Data columns (total 13 columns):
hotel           12 non-null object
id              12 non-null object
address         12 non-null object
latitude        12 non-null float64
longitude       12 non-null float64
distance        12 non-null int64
ratings         12 non-null float64
description     12 non-null object
website         12 non-null object
photo_prefix    12 non-null object
photo_suffix    12 non-null object
photo_width     12 non-null object
photo_height    12 non-null object
dtypes: float64(3), int64(1), object(9)
memory usage: 1.3+ KB


In [None]:
close_hotels = close_hotels.reset_index()

## 7. Find the best located hotel
I reduced the number of the possible hotels, based on my necessity. <br>
Now, I want to find the best located hotel based on my 4 clusters that I need to visit. <br>
The theory is that I am going to **calculate the travel time (in minutes) from the hotel to the centroids of each group**, and the best performing hotels will be selected.  <br>
In order to calculate the travel time, I am using the **Google Maps - Distance API**. 

Since it's not recommended to use the car in the city centre but better to use the public transportation I am going to calculate the travel time using the **"transit" mode**. 


### Small Correction: 
I need to change the centre of the second centroid (Toronto Islands & Centreville) - Apparenlty Google API struggles to find the route if we leave the centroid on the island so I have changed the position of the centroid to Jack Layton Ferry Terminal, 9 Queens Quay W, Toronto, ON M5J 2H3, Canada where the Ferry to the island will depart. The coordinates are taken from Wikipedia. 

In [176]:
# Change the centre of the Toronto Islands & Centreville cluster. 

clusters_citycentre[1][0] = -79.375278
clusters_citycentre[1][1] = 43.640278

In [177]:
clusters_citycentre

array([[-79.37606017,  43.65312749],
       [-79.375278  ,  43.640278  ],
       [-79.39323573,  43.64398408],
       [-79.40226681,  43.67227003]])

In [106]:
# Google API

API_key = ""

In [180]:
mode = "transit"
close_hotels["to_cluster_1"] = ""
close_hotels["to_cluster_2"] = ""
close_hotels["to_cluster_3"] = ""
close_hotels["to_cluster_4"] = ""

for i in range(len(close_hotels)): 
    origin = str(close_hotels["latitude"][i]) + "," + str(close_hotels["longitude"][i])
    for j in range(len(clusters_citycentre)):
        destination = str(clusters_citycentre[j][1]) + "," + str(clusters_citycentre[j][0])
        url = "https://maps.googleapis.com/maps/api/directions/json?origin="+origin+"&destination="+destination+"&mode="+mode+"&key=" + API_key
        response = requests.get(url)
        response_json = response.json()
        if response_json["status"] == "OK":
            duration  = response_json["routes"][0]["legs"][0]["duration"]["value"]/60.00 # the durantion is in minutes
            close_hotels["to_cluster_"+str(j+1)][i] = duration
        else: 
            close_hotels["to_cluster_"+str(j+1)][i] = "not available"
        print(i, close_hotels["hotel"][i], j, duration)
        

0 Hotel Gelato 0 25.7
0 Hotel Gelato 1 34.86666666666667
0 Hotel Gelato 2 47.03333333333333
0 Hotel Gelato 3 34.766666666666666
1 Alt Hotel Toronto Airport 0 62.6
1 Alt Hotel Toronto Airport 1 57.11666666666667
1 Alt Hotel Toronto Airport 2 65.65
1 Alt Hotel Toronto Airport 3 71.95
2 Gladstone Hotel 0 17.066666666666666
2 Gladstone Hotel 1 35.916666666666664
2 Gladstone Hotel 2 17.316666666666666
2 Gladstone Hotel 3 34.78333333333333
3 Four Seasons Hotel Toronto 0 17.116666666666667
3 Four Seasons Hotel Toronto 1 25.033333333333335
3 Four Seasons Hotel Toronto 2 25.216666666666665
3 Four Seasons Hotel Toronto 3 16.816666666666666
4 The Grand Hotel & Suites Toronto 0 7.116666666666666
4 The Grand Hotel & Suites Toronto 1 19.35
4 The Grand Hotel & Suites Toronto 2 22.733333333333334
4 The Grand Hotel & Suites Toronto 3 30.133333333333333
5 The Drake Hotel 0 14.95
5 The Drake Hotel 1 22.133333333333333
5 The Drake Hotel 2 16.283333333333335
5 The Drake Hotel 3 27.316666666666666
6 The Rex

In [182]:
close_hotels["total_duration"] = close_hotels["to_cluster_1"] +close_hotels["to_cluster_2"]+close_hotels["to_cluster_3"]+close_hotels["to_cluster_4"]
close_hotels = close_hotels.sort_values(by =["total_duration"])
close_hotels.reset_index(drop=True, inplace=True)

In [191]:
## Save the data into csv just in case 

close_hotels.to_csv("close_hotels.csv")


In [224]:
close_hotels.head()

Unnamed: 0,level_0,index,hotel,id,address,latitude,longitude,distance,ratings,description,website,photo_prefix,photo_suffix,photo_width,photo_height,to_cluster_1,to_cluster_2,to_cluster_3,to_cluster_4,total_duration
0,9,29,Cosmopolitan Toronto Centre Hotel & Spa,4ad4c05cf964a520bbf520e3,8 Colborne St (at Yonge St) M5E 1E1,43.649064,-79.377598,17814,8.1,,,https://fastly.4sqi.net/img/general/,/13967990_IRPHMiAzV9Z-m_pLnLJoFCMVglKbvzWkjFBE...,612,612,5.6,10.1,11.8833,19.8,47.3833
1,6,17,The Rex Hotel Jazz & Blues Bar,4b68aed1f964a520de862be3,194 Queen St W (Queen & St. Patrick) M5V 1Z1,43.650505,-79.388577,17143,8.1,"""More Great Jazz than anywhere else, all the t...",http://therex.ca,https://fastly.4sqi.net/img/general/,/62225795_xhuWgSWcdc3uor6EResNeiUvgIuiWp1QdO-b...,720,402,6.85,12.8167,11.1833,17.5667,48.4167
2,10,33,The Omni King Edward Hotel,4adf7d0bf964a520127b21e3,37 King Street East (bwtn Victoria St. and Tor...,43.649191,-79.376006,17884,8.2,"Located in the heart of Downtown Toronto, this...",https://www.omnihotels.com/hotels/toronto-king...,https://fastly.4sqi.net/img/general/,/26805235_VCSg-WfO8LJ5kkCMYMRlJvdC4b-jIWJILrFz...,959,717,7.63333,10.3167,13.5,22.1167,53.5667
3,8,28,Le Germain Hotel Toronto Mercer,4ad4c05cf964a520baf520e3,30 Mercer St M5V 1H3,43.645669,-79.391044,17460,8.3,"Choose Le Germain Hotel Toronto for elegance, ...",http://www.hotelboutique.com,https://fastly.4sqi.net/img/general/,/12853417_5AZ1YuWKbmCHB0yxlT8LlHztbNJp_BxLlHhK...,591,813,13.25,20.3167,4.18333,27.75,65.5
4,4,14,The Grand Hotel & Suites Toronto,4b7d9098f964a52014c72fe3,225 Jarvis St. (at Dundas St. E.) M5B 2C1,43.656449,-79.37411,17367,9.0,,http://www.grandhoteltoronto.com,https://fastly.4sqi.net/img/general/,/43228399_OgJeFnQuE733U7O5vg-3lxLokM6TUL8C4B12...,720,960,7.11667,19.35,22.7333,30.1333,79.3333


## 8. Show photos of the top 5 best located hotels 
I want to see now the best located hotels. <br>
Where are they located and if we can get at least one photo for each hotel. 

In [193]:
top_5_hotel = close_hotels[0:5]

In [199]:
# create map
hotel_final = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=13)

# set color scheme for the clusters


colors = ["yellow"]

# add markers to the map
markers_colors = []
folium.Marker(
        [citycentre_centroid_lat, citycentre_centroid_lng], 
        popup = "Centroid of city centre",
        icon=folium.Icon(icon='flag', color= "darkpurple")
    ).add_to(hotel_final)
folium.CircleMarker(
        [citycentre_centroid_lat, citycentre_centroid_lng],
        radius=110,
        popup=label,
        color="purple",
        fill=True,
        fill_color="purple",
        fill_opacity=0.3).add_to(hotel_final)
for lat, lon, poi in zip(top_5_hotel['latitude'], top_5_hotel['longitude'], top_5_hotel['hotel']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=colors[0],
        fill=True,
        fill_color=colors[0],
        fill_opacity=0.7).add_to(hotel_final)

       
hotel_final

<img src="8.top5.png">

In [228]:
from IPython.display import Image
from IPython.core.display import HTML 

for i in range(len(top_5_hotel)):
    print(top_5_hotel["hotel"][i])
    print("website: " + top_5_hotel["website"][i])
    print("rating: " + str(top_5_hotel["ratings"][i]))
    display(Image(url= str(top_5_hotel["photo_prefix"][i])+str(top_5_hotel["photo_width"][i])+"x"+str(top_5_hotel["photo_height"][i])+str(top_5_hotel["photo_suffix"][i]), width=300))



Cosmopolitan Toronto Centre Hotel & Spa
website: NA
rating: 8.1


The Rex Hotel Jazz & Blues Bar
website: http://therex.ca
rating: 8.1


The Omni King Edward Hotel
website: https://www.omnihotels.com/hotels/toronto-king-edward
rating: 8.2


Le Germain Hotel Toronto Mercer
website: http://www.hotelboutique.com
rating: 8.3


The Grand Hotel & Suites Toronto
website: http://www.grandhoteltoronto.com
rating: 9.0


## Results 

This notebook was created to facilitate the organization and scheduling of a hypothetical trip to Toronto. 
We trusted a website that recommended to visit 17 places in Toronto and we have mapped those destination and clustered them based on their coordinates. (Foursquare API to collect coordinates) 

The initial clustering was done by only using 7 clusters (k-means) because the destinations were already far apart from each other and we were already planning to spend the whole day on each of the POI far from the city centre. Moreover, we could group the days where we needed to rent a car in order to reach the distant destinations. 

- **Day 1**: Black Creek Pioneer Village (Orange dot)
- **Day 2**: Canada's Wonderland (Green Dot)
- **Day 3**: High Park (Blue dot)
- **Day 4**: Scarborough Bluffs (Red Dot)
- **Day 5**: Toronto Zoo (yellow dot)
- **Day 6** : Ontario Science Centre (light blue)

The remaining days are dedicated to explore the city centre. Since we are going to use the public transportation we can save money and reduce the rental of the car to 6 days instead of 10 days. 

- **Day 7** - Orange Group : Royal Ontario Museum , Casa Loma
- **Day 8** - Red Group :  Allen Gardens, TOronto Eaton Centre, St. Lawrence Market, Hokey Hall of fame 
- **Day 9** - Green Group: Art Gallery of Ontario, Ripley's Aquarium, CN Tower and Edge Walk 
- **Day 10** - Blue Group: Toronto Islands & Centreville 	

The next step is to find the best hotels. For me is very important that the rating of the chosen hotel is high and located very close to my POI of the city centre. First thing, I chose the hotels located in the premises of the city centre (18km max) and in order to calculate the travel time I used Google Map's Distance API to check how long was going to take from each hotel to reach each of the centroids of the city centre clusters. The top 5 best performing hotels are collected in the data frame at point 8. 

Based on merely the ratings I would be inclined to choose the The **Grand Hotel & Suites Toronto**. The travel time from each clusters are: 
- to cluster 1: 7min	
- to cluster 2: 19min	
- to cluster 3: 22min 	
- to cluster 4: 30min

which are reasonable distance/duration. 



## Discussion 

I think I am not the only person that dreamed of an automatic "journey Planner". Organising a trip can be very stressful especially because you want to have the best experience (and seeing everything!). However, it is a very subjective topic since we all have different taste and interests when it comes down to places to visit. 

In this particular notebook, the strong assumption was that the website  https://theculturetrip.com could offer a good choice of destinations for any user but the reality is that most of the time, it is not so easy. In a real use case scenario, creating the "array of the destination" might be more difficult than shown in point 1.

A second strong assumption was that, distant destinations were worth visiting for the whole day - In this particular case, it seems that indeed the further destinations were worthwhile spending an entire day (Canada's Wonderland etc.) but in some other use case, this is not true. i.e. you could spend half a day in one distant location, and drive to a further different part of the city and spend the remaining day. 

Another point worth mentioning is that I could have played more with different clustering techniques. For the moment I tried to use the classical method that we have learnt during the lessons of this course. 


## Conclusion 

This notebook can be seen as a tentative to create an automatic journey planner that could help in the initial phase of the organization of a trip. However, as discussed in the previous step, it is difficult to satisfy all the necessity and taste of the user/users. Realistically, this notebook can be used as a template to organize the POI based on the distance and help you scraping information around hotels that you might be interested in. (and it was a good occasion to play with various APIs :) )
