### This Capstone Project is a programme written to fulfil the requirements for completing the IBM Data Science Course.

# An Agile Way to Select Places to Visit for Vacation

## Table of Contents

Objectives

Preliminaries
 - Install and Import Libraries
  - install numpy, pandas, scipy, scikit-learn, unbalanced-learn, KMeans, requests, folium, 
    conda-forge, geopy
  - import requests, pandas as pd, numpy as np, json, json_normalize, matplotlib.cm, 
     matplotlib.colors, Image, HTML, folium, Norminatim
 - Define Functions
  - define haversine_distance, get_category_type

Source for Venue Data
 - Get Location-Specific Geographical Data
  - Define Foursquare Credentials
  - Get Address and Coordinates of a Point of Reference (Holiday Inn)
  - Convert Holiday Inn's Address to its Coordinates
 - Retrieve Venue Data
  - Explore all Venues around Holiday Inn's Coordinates
  - Get Specific Venue Parameters: name, category, latitude, longitude, postal code, ID

Identify Nearby Venues
 - Visualise Nearby Venues on a Map
 - Determine Venue Categories
 - List Nearby Venues
  - Locate Places near Holiday Inn

Venue Distances
 - Calculate Venue Distances

Explore Trending Venues
 - Identify Trending Venues
 - Visualise Trending Venues

Explore Venues (Places) of Personal Internet
 - Explore Venues to Visit
 - Visualise Similar Venues
 
Venue Ratings and Tips
 - Venue Ratings
  - Harvey's Venue Rating
  - Prairie 360s Venue Rating
  - deer + almond's Venue Rating
 - Venue Tips
  - Total Number of Venue Tips for Harvey's
  - Tips for Harvey's
  - Total Number of Venue Tips for Prairie 360
  - Tips for Prairie 360
  - Total Number of Venue Tips for deer + almond
  - Tips for deer + almond
  
Cluster Venues of Personal Interest
  - Group Venues by Postal Code
  - Analyse Venues Using One-Hot Encoding
  - Display Most Common Venues per Postal Code
  - Create Clusters of Venues of Personal Interest
  - Display Venue Clusters
  
Examine Venues of Personal Interest per Cluster
 - Cluster 0
 - Cluster 1
 - Cluster 2
 - Cluster 3
 - Cluster 4

Conclusion

Acknowledgement

## Objectives
To design a programme that will
 - streamline the process of searching for available venues at a particular location and produce a list of preferred venues with acceptable attributes
 - increase the ease and speed of getting required venue information, thereby reducing turnaround times in creating a plan for visiting different holiday venues
 - make it easy to modify the itinerary for a vacation in the event of an unexpected occurrence

## Preliminaries

### Install and Import Libraries

In [1]:
!pip install -U numpy # For scientific computing



In [2]:
!pip install -U pandas # For data analysis and manipulation



In [3]:
!pip install -U scipy==1.4.1 # For numerical computation



In [4]:
!pip install -U scikit-learn # For predictive data analysis



In [5]:
!pip install -U imbalanced-learn # For resampling techniques used in balancing datasets



In [6]:
from sklearn.cluster import KMeans # To form clusters

In [7]:
pip install requests # To send HTTP requests

Note: you may need to restart the kernel to use updated packages.


In [8]:
import requests # To download a web page

In [9]:
import pandas as pd # To create pandas dataframe

In [10]:
import numpy as np # To handle data in a vectorised manner

In [11]:
import json # To handle JSON files

In [12]:
from pandas.io.json import json_normalize # To transform JSON file in to a pandas dataframe

In [13]:
# For creating static, animated and interactive visualisations
import matplotlib.cm as cm
import matplotlib.colors as colors

In [14]:
# Interactive command-line used to display images
from IPython.display import Image
from IPython.core.display import HTML

In [15]:
# To visualise data on an interactive leaflet map
!pip install folium==0.5.0



In [16]:
import folium # Geospatial data rendering library

In [17]:
!conda install -c conda-forge geopy --yes # To install geopy

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [18]:
from geopy.geocoders import Nominatim # Convert an address to latitude and longitude values

In [19]:
from math import radians, cos, sin, asin, sqrt # For mathematical tasks

In [20]:
# Define Haversine Distance

def haversine_distance(lat1, lon1, lat2, lon2):
   r = 6371
   phi1 = np.radians(lat1)
   phi2 = np.radians(lat2)
   delta_phi = np.radians(lat2 - lat1)
   delta_lambda = np.radians(lon2 - lon1)
   a = np.sin(delta_phi / 2)**2 + np.cos(phi1) * np.cos(phi2) * np.sin(delta_lambda / 2)**2
   res = r * (2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a)))
   return np.round(res, 2)

# Haversine Distance gives the great-circle distance between two points on a sphere given their longitudes and latitudes.

In [21]:
# Define the get_category_type function, which extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Source for Venue Data

### Get Location-Specific Geographical Data

##### "The Places API offers real-time access to Foursquare’s global database of rich venue data and user content to power your location-based experiences in your app or website."
https://developer.foursquare.com/docs/places-api/

#### Define Foursquare Credentials

In [22]:
# The code was removed by Watson Studio for sharing.

#### Get Address and Coordinates of a Point of Reference (Holiday Inn, Winnipeg)

In [23]:
# calling the Nominatim tool
loc = Nominatim(user_agent="GetLoc")
  
# entering the location name
getLoc = loc.geocode("Holiday Inn & Suites Winnipeg-Downtown, Winnipeg")

print('The coordinates of {} are {}, {}.'.format(getLoc.address, getLoc.longitude, getLoc.latitude))

The coordinates of Holiday Inn & Suites Winnipeg Downtown, 360, Colony Street, Colony, West End, Winnipeg, Manitoba, R3C 0E7, Canada are -97.15145373256972, 49.8911612.


#### Convert Holiday Inn's Address to its Coordinates

In [24]:
address = 'Holiday Inn & Suites Winnipeg-Downtown, Winnipeg, Manitoba, Canada'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

49.8911612 -97.15145373256972


### Retrieve Venue Data

#### Explore all Venues around Holiday Inn's Coordinates

In [25]:
# Define Holiday Inn's url
radius = 1500 # defined radius
limit = 100 # defined number of venues

url_holiday_inn = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)

In [26]:
# Use GET function to determine the venues around Holiday Inn.
results_venues = requests.get(url_holiday_inn).json()
'There are {} venues around Holiday Inn.'.format(len(results_venues['response']['groups'][0]['items']))

'There are 100 venues around Holiday Inn.'

In [27]:
# Get relevant part of the JSON file (results_venues)
items = results_venues['response']['groups'][0]['items']

#### Get Specific Venue Parameters: name, category, latitude, longitude, postal code, ID

In [28]:
# Process JSON file (results_venues) and convert it to a clean dataframe
df_venues = json_normalize(items) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in df_venues.columns if col.startswith('venue.location.')] + ['venue.id']
df_venues_filtered = df_venues.loc[:, filtered_columns]

# filter the category for each row
df_venues_filtered['venue.categories'] = df_venues_filtered.apply(get_category_type, axis=1)

# clean columns
df_venues_filtered.columns = [col.split('.')[-1] for col in df_venues_filtered.columns]

df_venues_filtered.head()

  df_venues = json_normalize(items) # flatten JSON


Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Winnipeg Art Gallery,Art Gallery,300 Memorial Blvd,St. Mary Ave,49.889459,-97.150708,"[{'label': 'display', 'lat': 49.88945877797471...",196,R3C 1V1,CA,Winnipeg,MB,Canada,"[300 Memorial Blvd (St. Mary Ave), Winnipeg MB...",,4b5f6471f964a520b4b729e3
1,Harvey's,Restaurant,600 Portage Avenue,,49.888922,-97.156059,"[{'label': 'display', 'lat': 49.88892237017493...",413,R3C 3L7,CA,Winnipeg,MB,Canada,"[600 Portage Avenue, Winnipeg MB R3C 3L7, Canada]",West Broadway,4edd1bb277c8274e005b3e73
2,Stella's Café and Bakery,Café,2-460 Portage Ave.,at Colony St.,49.890672,-97.151225,"[{'label': 'display', 'lat': 49.89067242317363...",56,R3C 0E8,CA,Winnipeg,MB,Canada,"[2-460 Portage Ave. (at Colony St.), Winnipeg ...",,4e35a0d4d22d86185a6e2f72
3,Thom Bargen coffee & tea,Coffee Shop,250 Kennedy,Graham Ave,49.891012,-97.148059,"[{'label': 'display', 'lat': 49.891012, 'lng':...",244,R3M 1Y1,CA,Winnipeg,MB,Canada,"[250 Kennedy (Graham Ave), Winnipeg MB R3M 1Y1...",,56e0429fcd10ce07507635b3
4,Starbucks,Coffee Shop,412 Graham Ave,at Kennedy St.,49.890856,-97.147947,"[{'label': 'display', 'lat': 49.89085632619919...",253,R3C 0L8,CA,Winnipeg,MB,Canada,"[412 Graham Ave (at Kennedy St.), Winnipeg MB ...",,4b673b2af964a520b3422be3


## Identify Nearby Venues

### Visualize Nearby Venues on a Map

In [29]:
hi_venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around Holiday Inn's coordinates.


# add Hoiiday Inn as a red circle mark
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Hoiiday Inn',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(hi_venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(df_venues_filtered.lat, df_venues_filtered.lng, df_venues_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(hi_venues_map)

# display map
hi_venues_map

### Determine Venue Categories

In [30]:
# Number of venues returned by Foursquare
print('{} venues around {} were returned by Foursquare'.format(df_venues_filtered.shape[0], getLoc.address))

100 venues around Holiday Inn & Suites Winnipeg Downtown, 360, Colony Street, Colony, West End, Winnipeg, Manitoba, R3C 0E7, Canada were returned by Foursquare


In [31]:
# Determine how many unique venue categories can be curated from all the returned venues
print('There are {} unique venues categories around {}.'.format(len(df_venues_filtered['categories'].unique()), getLoc.address))

There are 58 unique venues categories around Holiday Inn & Suites Winnipeg Downtown, 360, Colony Street, Colony, West End, Winnipeg, Manitoba, R3C 0E7, Canada.


In [32]:
# Count number of values that exist within the 'categories' column that are similar
df_venues_filtered['categories'].value_counts().head(2)

Coffee Shop         7
Asian Restaurant    6
Name: categories, dtype: int64

In [33]:
# Group and count the non-null values in the 'categories' column and then sort them along the 'name' column
df_venues_filtered.groupby('categories').count().sort_values('name', ascending=False).reset_index().head(2)

Unnamed: 0,categories,name,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Coffee Shop,7,7,5,7,7,7,7,7,7,7,7,7,7,0,7
1,Sushi Restaurant,6,6,5,6,6,6,6,6,6,6,6,6,6,0,6


In [34]:
# Sort the values along the 'categories' column
df_venues_filtered.sort_values('categories', ascending=[False]).head(2)

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
78,Pho No.1,Vietnamese Restaurant,81 Isabel St.,,49.900449,-97.150537,"[{'label': 'display', 'lat': 49.90044933549933...",1036,R3A 1E8,CA,Winnipeg,MB,Canada,"[81 Isabel St., Winnipeg MB R3A 1E8, Canada]",,4b75b5e4f964a520d81e2ee3
9,Viva Restaurant,Vietnamese Restaurant,1-505 Sargent Ave,,49.896364,-97.154059,"[{'label': 'display', 'lat': 49.896364, 'lng':...",608,R3B 1V9,CA,Winnipeg,MB,Canada,"[1-505 Sargent Ave, Winnipeg MB R3B 1V9, Canada]",,4bdce063462b2d7fdbc3113c


In [35]:
# Sort the values along the 'name' column
df_venues_filtered.sort_values(['name'], ascending=[False]).head(2)

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
13,deer + almond,Restaurant,85 Princess St.,at McDermot Ave.,49.897755,-97.142537,"[{'label': 'display', 'lat': 49.89775472591209...",973,R3B 1K6,CA,Winnipeg,MB,Canada,"[85 Princess St. (at McDermot Ave.), Winnipeg ...",,4f5832dfe4b032fce5f1f4a1
36,Yoga Public,Gym / Fitness Center,280 Fort St.,,49.894146,-97.13948,"[{'label': 'display', 'lat': 49.89414584305382...",920,R3C 1E5,CA,Winnipeg,MB,Canada,"[280 Fort St., Winnipeg MB R3C 1E5, Canada]",,4f14ed12c2ee45f86ab592c6


### List Nearby Venues

#### Locate places near Holiday Inn

##### Define 'nearby_venues'

In [36]:
# clean the json and structure it into a pandas dataframe noting that the required information is in the 'items'key

venues = results_venues['response']['groups'][0]['items']

hi_nearby_venues = json_normalize(venues) # flatten JSON 

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
hi_nearby_venues = hi_nearby_venues.loc[:, filtered_columns] 

# filter the category for each row
hi_nearby_venues['venue.categories'] = hi_nearby_venues.apply(get_category_type, axis=1) 

# clean columns 
hi_nearby_venues.columns = [col.split(".")[-1] for col in hi_nearby_venues.columns]

hi_nearby_venues.head()

  hi_nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Winnipeg Art Gallery,Art Gallery,49.889459,-97.150708
1,Harvey's,Restaurant,49.888922,-97.156059
2,Stella's Café and Bakery,Café,49.890672,-97.151225
3,Thom Bargen coffee & tea,Coffee Shop,49.891012,-97.148059
4,Starbucks,Coffee Shop,49.890856,-97.147947


In [37]:
# Group the locations and determine how many similar venue exist per catergory.
df_hi = hi_nearby_venues.groupby('categories').count().sort_values('name', ascending=False)
df_hi.reset_index(inplace=True)
df_hi.head()

Unnamed: 0,categories,name,lat,lng
0,Coffee Shop,7,7,7
1,Sushi Restaurant,6,6,6
2,Asian Restaurant,6,6,6
3,Café,5,5,5
4,Vietnamese Restaurant,4,4,4


In [38]:
print('There are {} {}s, {} {}s, {} {}, and {} {} that can be used to create different experiences for the family.'.format(df_hi.iloc[0,1], df_hi.iloc[0,0], df_hi.iloc[1,1], df_hi.iloc[1,0], df_hi.iloc[2,1], df_hi.iloc[2,0], df_hi.iloc[3,1], df_hi.iloc[3,0]))

There are 7 Coffee Shops, 6 Sushi Restaurants, 6 Asian Restaurant, and 5 Café that can be used to create different experiences for the family.


##### Re-arrange the columns of the dataframe

In [39]:
# Move the column 'id' from the last-column position to fifth-column position for easy visibility
column5 = df_venues_filtered.pop('id') # Remove the last column 'id' from the original dataframe and place it in a variable
df_venues_filtered.insert(4, 'id', column5) # Add the column 'id' to index position 4 (that is, column 5 position) in the dataframe
df_venues_filtered.head(2) # Display the first two rows of the dataframe to confirm that the column was correctly relocated

Unnamed: 0,name,categories,address,crossStreet,id,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood
0,Winnipeg Art Gallery,Art Gallery,300 Memorial Blvd,St. Mary Ave,4b5f6471f964a520b4b729e3,49.889459,-97.150708,"[{'label': 'display', 'lat': 49.88945877797471...",196,R3C 1V1,CA,Winnipeg,MB,Canada,"[300 Memorial Blvd (St. Mary Ave), Winnipeg MB...",
1,Harvey's,Restaurant,600 Portage Avenue,,4edd1bb277c8274e005b3e73,49.888922,-97.156059,"[{'label': 'display', 'lat': 49.88892237017493...",413,R3C 3L7,CA,Winnipeg,MB,Canada,"[600 Portage Avenue, Winnipeg MB R3C 3L7, Canada]",West Broadway


## Venue Distances

### Calculate Venue Distances

##### Get the distance from Holiday Inn to the venues that may be included in family outings.

In [40]:
# Define the reference values for latitude and longitude
start_lat, start_lon = latitude, longitude

In [41]:
# Create a 'distance_km' variable to store the distances calculated by the haversine formula

distances_km = []

for row in df_venues_filtered.itertuples(index=False):
    distances_km.append(
        haversine_distance(start_lat, start_lon, row.lat, row.lng))

In [42]:
# Create a new column in 'df_venues_filtered' to hold the distances stored in 'distance_km' variable
df_venues_filtered['DistanceFromHolidayInn'] = distances_km

In [43]:
# Sort the new 'df_venues_filtered' dataframe according to the columns 'categories', 'name', 'DistanceFromHolidayInn' and store in a new variable, 'winnipeg_venues'

winnipeg_venues = df_venues_filtered.sort_values(['categories', 'name', 'DistanceFromHolidayInn'])
winnipeg_venues.head(2)

Unnamed: 0,name,categories,address,crossStreet,id,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,DistanceFromHolidayInn
0,Winnipeg Art Gallery,Art Gallery,300 Memorial Blvd,St. Mary Ave,4b5f6471f964a520b4b729e3,49.889459,-97.150708,"[{'label': 'display', 'lat': 49.88945877797471...",196,R3C 1V1,CA,Winnipeg,MB,Canada,"[300 Memorial Blvd (St. Mary Ave), Winnipeg MB...",,0.2
43,Asia City,Asian Restaurant,519 Sargent Ave,,4da753106a2364c7a33bbb06,49.896464,-97.154857,"[{'label': 'display', 'lat': 49.89646350846065...",638,R3B 1W1,CA,Winnipeg,MB,Canada,"[519 Sargent Ave, Winnipeg MB R3B 1W1, Canada]",,0.64


##### Re-arrange 'winnipeg_venues' to show the 'DistanceFromHolidayInn' at first glance

In [44]:
# Move the column 'DistanceFromHolidayInn' from the last-column position to sixth-column position for easy visibility
column6 = winnipeg_venues.pop('DistanceFromHolidayInn') # Remove the last column 'DistanceFromHolidayInn' from the original dataframe and place it in a new variable, 'column6'
winnipeg_venues.insert(5, 'DistanceFromHolidayInn', column6) # Add the column 'DistanceFromHolidayInn' to index position 5 (that is, column 6 position) in the dataframe
winnipeg_venues.head(2) # Display the first two rows of the new dataframe to confirm that the column was correctly relocated

Unnamed: 0,name,categories,address,crossStreet,id,DistanceFromHolidayInn,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood
0,Winnipeg Art Gallery,Art Gallery,300 Memorial Blvd,St. Mary Ave,4b5f6471f964a520b4b729e3,0.2,49.889459,-97.150708,"[{'label': 'display', 'lat': 49.88945877797471...",196,R3C 1V1,CA,Winnipeg,MB,Canada,"[300 Memorial Blvd (St. Mary Ave), Winnipeg MB...",
43,Asia City,Asian Restaurant,519 Sargent Ave,,4da753106a2364c7a33bbb06,0.64,49.896464,-97.154857,"[{'label': 'display', 'lat': 49.89646350846065...",638,R3B 1W1,CA,Winnipeg,MB,Canada,"[519 Sargent Ave, Winnipeg MB R3B 1W1, Canada]",


## Explore Trending Venues

### Identify Trending Venues

In [45]:
# define URL
url_trend = 'https://api.foursquare.com/v2/venues/trending?client_id={}&client_secret={}&ll={},{}&v={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION)

In [46]:
# send GET request and get trending venues
results_trend = requests.get(url_trend).json()

In [47]:
if len(results_trend['response']['venues']) == 0:
    trending_venues_df = 'No trending venues are available at the moment!'
    
else:
    trending_venues = results_trend['response']['venues']
    trending_venues_df = json_normalize(trending_venues)

    # filter columns
    columns_filtered = ['name', 'categories'] + ['location.distance', 'location.city', 'location.postalCode', 'location.state', 'location.country', 'location.lat', 'location.lng']
    trending_venues_df = trending_venues_df.loc[:, columns_filtered]

    # filter the category for each row
    trending_venues_df['categories'] = trending_venues_df.apply(get_category_type, axis=1)

In [48]:
# display trending venues
trending_venues_df

'No trending venues are available at the moment!'

### Visualise Trending Venues

In [49]:
if len(results_trend['response']['venues']) == 0:
    venues_map = 'Cannot generate visual as no trending venues are available at the moment!'

else:
    venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around Hoiiday Inn


    # add Hoiiday Inn as a red circle mark
    folium.features.CircleMarker(
        [latitude, longitude],
        radius=10,
        popup='Hoiiday Inn',
        fill=True,
        color='red',
        fill_color='red',
        fill_opacity=0.6
    ).add_to(venues_map)


    # add the trending venues as blue circle markers
    for lat, lng, label in zip(trending_venues_df['location.lat'], trending_venues_df['location.lng'], trending_venues_df['name']):
        folium.features.CircleMarker(
            [lat, lng],
            radius=5,
            poup=label,
            fill=True,
            color='blue',
            fill_color='blue',
            fill_opacity=0.6
        ).add_to(venues_map)

In [50]:
# Display Map

venues_map

'Cannot generate visual as no trending venues are available at the moment!'

## Explore Venues (Places) of Personal Interest

### Explore Venues to Visit

In [51]:
# Create a list of options for the search for venues of personal interest
options = ['Gym', 'Food Court', 'Restaurant', 'Pizza Place', 'Art Gallery', 'Coffee Shop', 'Theater', 'Juice Bar']

# Display only the rows that fulfil the condition in "options"
hi_list = winnipeg_venues[hi_nearby_venues['categories'].isin(options)]
hi_list.sort_values(by=('categories'), inplace=True)

hi_list.head(2)

  hi_list = winnipeg_venues[hi_nearby_venues['categories'].isin(options)]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)


Unnamed: 0,name,categories,address,crossStreet,id,DistanceFromHolidayInn,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood
0,Winnipeg Art Gallery,Art Gallery,300 Memorial Blvd,St. Mary Ave,4b5f6471f964a520b4b729e3,0.2,49.889459,-97.150708,"[{'label': 'display', 'lat': 49.88945877797471...",196,R3C 1V1,CA,Winnipeg,MB,Canada,"[300 Memorial Blvd (St. Mary Ave), Winnipeg MB...",
32,Little Sister Coffee Maker,Coffee Shop,A-470 River Ave,,5227ba6011d28632966c43a0,1.39,49.878993,-97.146772,"[{'label': 'display', 'lat': 49.87899300682746...",1395,R3L 0C8,CA,Winnipeg,MB,Canada,"[A-470 River Ave, Winnipeg MB R3L 0C8, Canada]",


#### Explore explore restaurants near the hotel.

In [52]:
# Get the details of restaurant near the hotel
winnipeg_restaurant = winnipeg_venues[(winnipeg_venues.categories=='Restaurant')]
winnipeg_restaurant

Unnamed: 0,name,categories,address,crossStreet,id,DistanceFromHolidayInn,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood
1,Harvey's,Restaurant,600 Portage Avenue,,4edd1bb277c8274e005b3e73,0.41,49.888922,-97.156059,"[{'label': 'display', 'lat': 49.88892237017493...",413,R3C 3L7,CA,Winnipeg,MB,Canada,"[600 Portage Avenue, Winnipeg MB R3C 3L7, Canada]",West Broadway
61,Prairie 360,Restaurant,83 Garry Street,,5203fd442fc63a7e19f4f4cb,1.16,49.887467,-97.136342,"[{'label': 'display', 'lat': 49.8874673755844,...",1159,R3C 0R3,CA,Winnipeg,MB,Canada,"[83 Garry Street, Winnipeg MB R3C 0R3, Canada]",
13,deer + almond,Restaurant,85 Princess St.,at McDermot Ave.,4f5832dfe4b032fce5f1f4a1,0.97,49.897755,-97.142537,"[{'label': 'display', 'lat': 49.89775472591209...",973,R3B 1K6,CA,Winnipeg,MB,Canada,"[85 Princess St. (at McDermot Ave.), Winnipeg ...",


### Visualise  Similar Venues

In [53]:
# Display the restaurants in relation to the hotel on a map
restaurant_map = folium.Map(location=[latitude, longitude], zoom_start=16) # generate map centred around "Holiday Inn"

# add a red circle marker to represent the Holiday Inn
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Holiday Inn',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(restaurant_map)

# add the restaurants as blue circle markers
for lat, lng, label in zip(winnipeg_restaurant.lat, winnipeg_restaurant.lng, winnipeg_restaurant.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(restaurant_map)

# display map
restaurant_map

## Venue Ratings and Tips

### Venue Ratings

#### Harvey's Venue Rating

In [54]:
venue_id = '4edd1bb277c8274e005b3e73' # ID of Harvey's
url_harvey = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)

In [55]:
# Send GET request for metro_result
harvey_result = requests.get(url_harvey).json()

In [56]:
# Get the venue's overall rating
try:
    print(harvey_result['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

8.8


#### Prairie 360s Venue Rating

In [57]:
venue_id = '5203fd442fc63a7e19f4f4cb' # ID of Prairie 360
url_prairie360 = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)

In [58]:
# Send GET request for metro_result
prairie360_result = requests.get(url_prairie360).json()

In [59]:
# Get the venue's overall rating
try:
    print(prairie360_result['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

7.7


#### deer + almond's Venue Rating

In [60]:
venue_id = '4f5832dfe4b032fce5f1f4a1' # ID of deer + almond
url_deer_almond = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)

In [61]:
# Send GET request for metro_result
deer_almond_result = requests.get(url_deer_almond).json()

In [62]:
# Get the venue's overall rating
try:
    print(deer_almond_result['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

8.7


### Venue Tips

#### Total Number of Venue Tips for Harvey's

In [63]:
# Get total number of venue's tips

venue_id = '4edd1bb277c8274e005b3e73' # ID of Harvey's
url_harvey = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

In [64]:
# Send GET request for 'results'
harvey_result = requests.get(url_harvey).json()

In [65]:
harvey_result['response']['venue']['tips']['count']

11

##### Tips for Harvey's

In [66]:
# Limit should be a number that is equal to or greater than the total umber of tips

limit = 12
venue_id = '4edd1bb277c8274e005b3e73' # ID of Harvey's
url_harvey_tips = 'https://api.foursquare.com/v2/venues/{}/tips?client_id={}&client_secret={}&v={}&limit={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION, limit)

In [67]:
# Send GET request for result
results_harvey_tips = requests.get(url_harvey_tips).json()

In [68]:
# Get tips
tips_harvey = results_harvey_tips['response']['tips']['items']

In [69]:
tip_harvey = results_harvey_tips['response']['tips']['items'][0]
tip_harvey.keys()

dict_keys(['id', 'createdAt', 'text', 'type', 'canonicalUrl', 'lang', 'likes', 'logView', 'agreeCount', 'disagreeCount', 'todo', 'user'])

In [70]:
pd.set_option('display.max_colwidth', None)

harvey_tips_df = json_normalize(tips_harvey) # json normalize tips

# columns to keep
harvey_filtered_columns = ['text', 'agreeCount', 'disagreeCount', 'id', 'user.firstName', 'user.lastName']
harvey_tips_filtered = harvey_tips_df.loc[:, harvey_filtered_columns]


# display tips
harvey_tips_filtered

# A maximum of 1 tip will be shown since a sandbox account is used. A personal accout would display up to 2 tips.

  harvey_tips_df = json_normalize(tips_harvey) # json normalize tips


Unnamed: 0,text,agreeCount,disagreeCount,id,user.firstName,user.lastName
0,Best customer service of any fast food restaurant in Winnipeg. (And better than many of the full-service restaurants too!),0,0,4f39a37ae4b03d32dee3e2d7,Greg,P


#### Total Number of a Venue's Tips for Prairie 360

In [71]:
# Get total number of venue's tips

venue_id = '5203fd442fc63a7e19f4f4cb' # ID of Prairie 360
url_prairie360 = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

In [72]:
# Send GET request for 'results'
prairie360_result = requests.get(url_prairie360).json()

In [73]:
prairie360_result['response']['venue']['tips']['count']

10

#### Tips for Prairie 360

In [74]:
# Limit should be a number that is equal to or greater than the total umber of tips

limit = 13
venue_id = '5203fd442fc63a7e19f4f4cb' # ID of Prairie 360
url_prairie360_tips = 'https://api.foursquare.com/v2/venues/{}/tips?client_id={}&client_secret={}&v={}&limit={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION, limit)

In [75]:
# Send GET request for result
results_prairie360_tips = requests.get(url_prairie360_tips).json()

In [76]:
tips_prairie360 = results_prairie360_tips['response']['tips']['items']

In [77]:
tip_prairie360 = results_prairie360_tips['response']['tips']['items'][0]
tip_prairie360.keys()

dict_keys(['id', 'createdAt', 'text', 'type', 'canonicalUrl', 'lang', 'likes', 'logView', 'agreeCount', 'disagreeCount', 'todo', 'user'])

In [78]:
pd.set_option('display.max_colwidth', None)

prairie360_tips_df = json_normalize(tips_prairie360) # json normalize tips

# columns to keep
prairie360_filtered_columns = ['text', 'agreeCount', 'disagreeCount', 'id', 'user.firstName', 'user.lastName']
prairie360_tips_filtered = prairie360_tips_df.loc[:, prairie360_filtered_columns]


# display tips
prairie360_tips_filtered

# A maximum of 1 tip will be shown since a sandbox account is used. A personal accout would display up to 2 tips.

  prairie360_tips_df = json_normalize(tips_prairie360) # json normalize tips


Unnamed: 0,text,agreeCount,disagreeCount,id,user.firstName,user.lastName
0,"The steak alone was great but with gnocchi, lovely. East meets west Arctic Char was splendid. Just a bubbly to share? Charcuterie is the answer.",0,0,5b553b59d69ed0002c1b326c,Colleen,P


#### Total Number of Venue Tips for deer + almond

In [79]:
# Get total number of venue's tips

venue_id = '4f5832dfe4b032fce5f1f4a1' # ID of deer + almond
url_deer_almond = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

In [80]:
# Send GET request for 'results'
deer_almond_result = requests.get(url_deer_almond).json()

In [81]:
deer_almond_result['response']['venue']['tips']['count']

16

#### Tips for deer + almond

In [82]:
# Limit should be a number that is equal to or greater than the total umber of tips

limit = 13
venue_id = '4f5832dfe4b032fce5f1f4a1' # ID of deer + almond
url_deer_almond_tips = 'https://api.foursquare.com/v2/venues/{}/tips?client_id={}&client_secret={}&v={}&limit={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION, limit)

In [83]:
# Send GET request for result
results_deer_almond_tips = requests.get(url_deer_almond_tips).json()

In [84]:
tips_deer_almond = results_deer_almond_tips['response']['tips']['items']

In [85]:
tip_deer_almond = results_deer_almond_tips['response']['tips']['items'][0]
tip_deer_almond.keys()

dict_keys(['id', 'createdAt', 'text', 'type', 'canonicalUrl', 'lang', 'likes', 'logView', 'agreeCount', 'disagreeCount', 'todo', 'user'])

In [86]:
pd.set_option('display.max_colwidth', None)

deer_almond_tips_df = json_normalize(tips_deer_almond) # json normalize tips

# columns to keep
deer_almond_filtered_columns = ['text', 'agreeCount', 'disagreeCount', 'id', 'user.firstName']
deer_almond_tips_filtered = deer_almond_tips_df.loc[:, deer_almond_filtered_columns]


# display tips
deer_almond_tips_filtered

# A maximum of 1 tip will be shown since a sandbox account is used. A personal accout would display up to 2 tips.

  deer_almond_tips_df = json_normalize(tips_deer_almond) # json normalize tips


Unnamed: 0,text,agreeCount,disagreeCount,id,user.firstName
0,We found this gem a bit off tourist path. Everything look good and what we ordered was very tasty. Worth the detour.,2,0,577e537d498e1acf7f793cff,Max


## Cluster Venues of Personal Interest

#### Group Venues by Postal Code

In [87]:
# Check how many venues were returned for each neighborhood
hi_list.groupby('postalCode').count().reset_index().head(2)

Unnamed: 0,postalCode,name,categories,address,crossStreet,id,DistanceFromHolidayInn,lat,lng,labeledLatLngs,distance,cc,city,state,country,formattedAddress,neighborhood
0,R3A 1G7,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0
1,R3B 0P8,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0


In [88]:
# Determine how many unique categories can be curated from all the returned venues
print('There are {} unique categories.'.format(len(hi_list['categories'].unique())))

There are 7 unique categories.


#### Analyze Venues Using One-Hot Encoding

In [89]:
# one hot encoding
holidayinn_onehot = pd.get_dummies(hi_list[['categories']], prefix="", prefix_sep="")

# add postal code column back to dataframe
holidayinn_onehot['Postal Code'] = hi_list['postalCode'] 

# move postal code column to the first column
fixed_columns = [holidayinn_onehot.columns[-1]] + list(holidayinn_onehot.columns[:-1])
holidayinn_onehot = holidayinn_onehot[fixed_columns]

In [90]:
# Display the number of rows and columns that are in the dataframe
holidayinn_onehot.shape

(18, 8)

In [91]:
# Display the first two rows of the dataframe
holidayinn_onehot.head(2)

Unnamed: 0,Postal Code,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Restaurant,Theater
0,R3C 1V1,1,0,0,0,0,0,0
32,R3L 0C8,0,1,0,0,0,0,0


In [92]:
# Group the dataframe according to 'Postal Code', get the mean of the result and reset the index
holidayinn_categories_grouped=holidayinn_onehot.groupby('Postal Code').mean().reset_index()
holidayinn_categories_grouped.head(2)

Unnamed: 0,Postal Code,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Restaurant,Theater
0,R3A 1G7,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,R3B 0P8,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [93]:
# Display the number of rows and columns that are in the dataframe
holidayinn_categories_grouped.shape

(16, 8)

#### Identify Most Common Venues per Postal Code

In [94]:
# Print the two most common venues within each postal code

num_top_venues = 2

for code in holidayinn_categories_grouped['Postal Code']:
    print("----"+code+"----")
    temp = holidayinn_categories_grouped[holidayinn_categories_grouped['Postal Code'] == code].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----R3A 1G7----
         venue  freq
0  Coffee Shop   1.0
1  Art Gallery   0.0


----R3B 0P8----
         venue  freq
0      Theater   1.0
1  Art Gallery   0.0


----R3B 1B6----
         venue  freq
0  Coffee Shop   1.0
1  Art Gallery   0.0


----R3B 1K6----
         venue  freq
0   Restaurant   1.0
1  Art Gallery   0.0


----R3B 2E4----
         venue  freq
0    Juice Bar   1.0
1  Art Gallery   0.0


----R3B 2N7----
         venue  freq
0          Gym   1.0
1  Art Gallery   0.0


----R3C 0C9----
         venue  freq
0  Coffee Shop   1.0
1  Art Gallery   0.0


----R3C 0L8----
         venue  freq
0  Coffee Shop   1.0
1  Art Gallery   0.0


----R3C 0M6----
         venue  freq
0  Pizza Place   1.0
1  Art Gallery   0.0


----R3C 0R3----
         venue  freq
0   Restaurant   1.0
1  Art Gallery   0.0


----R3C 1V1----
         venue  freq
0  Art Gallery   1.0
1  Coffee Shop   0.0


----R3C 3L7----
         venue  freq
0   Restaurant   1.0
1  Art Gallery   0.0


----R3E 1J7----
         ven

In [95]:
# Create a function to sort the top venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Create a new dataframe for the top venues within each postal code

In [96]:
# Define the number of top venues and the indicators
num_top_venues = len(hi_list['categories'].unique())

indicators = ['st', 'nd', 'rd']

In [97]:
# Create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [98]:
# create a new dataframe
holidayinn_venues_sorted = pd.DataFrame(columns=columns)
holidayinn_venues_sorted['Postal Code'] = holidayinn_categories_grouped['Postal Code']

In [99]:
# Create a 'for' loop
for ind in np.arange(holidayinn_categories_grouped.shape[0]):
    holidayinn_venues_sorted.iloc[ind, 1:] = return_most_common_venues(holidayinn_categories_grouped.iloc[ind, :], num_top_venues)

In [100]:
# Display the new dataframe
holidayinn_venues_sorted.head(2)

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,R3A 1G7,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater
1,R3B 0P8,Theater,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Restaurant


### Create Clusters of Venues of Personal Interest

##### Create five clusters from all the venues that are around Holiday Inn using K-means clustering.

In [101]:
# set number of clusters 
kclusters = 5

holidayinn_grouped_clustering = holidayinn_categories_grouped.drop('Postal Code', 1) 

# run k-means clustering 
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(holidayinn_grouped_clustering) 

# check cluster labels generated for each row in the dataframe 
kmeans.labels_[0:10]

  holidayinn_grouped_clustering = holidayinn_categories_grouped.drop('Postal Code', 1)


array([1, 0, 1, 2, 0, 4, 1, 1, 3, 2], dtype=int32)

##### Get the desired columns in a dataframe

In [102]:
# Create a new dataframe, 'holidayinn_data' to hold the desired columns in 'hi_list'
holidayinn_data = hi_list[['postalCode', 'name', 'categories', 'lat', 'lng', 'id', 'DistanceFromHolidayInn']]
holidayinn_data.head(2)

Unnamed: 0,postalCode,name,categories,lat,lng,id,DistanceFromHolidayInn
0,R3C 1V1,Winnipeg Art Gallery,Art Gallery,49.889459,-97.150708,4b5f6471f964a520b4b729e3,0.2
32,R3L 0C8,Little Sister Coffee Maker,Coffee Shop,49.878993,-97.146772,5227ba6011d28632966c43a0,1.39


##### Create a new dataframe that includes the cluster as well as the top 8 venues in each postal code.

In [103]:
# add clustering labels
holidayinn_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_) 

In [104]:
# View the first two rows of 'holidayinn_venues_sorted'
holidayinn_venues_sorted.head(2)

Unnamed: 0,Cluster Labels,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,1,R3A 1G7,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater
1,0,R3B 0P8,Theater,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Restaurant


In [105]:
# View the first two rows of 'holidayinn_data' to confirm that it contains the required columns
holidayinn_data.head(2)

Unnamed: 0,postalCode,name,categories,lat,lng,id,DistanceFromHolidayInn
0,R3C 1V1,Winnipeg Art Gallery,Art Gallery,49.889459,-97.150708,4b5f6471f964a520b4b729e3,0.2
32,R3L 0C8,Little Sister Coffee Maker,Coffee Shop,49.878993,-97.146772,5227ba6011d28632966c43a0,1.39


In [106]:
# Define 'holidayinn_merged'
holidayinn_merged = holidayinn_data

In [107]:
# View the first two rows of 'holidayinn_merged' to confirm that it contains the required columns
holidayinn_merged.head(2)

Unnamed: 0,postalCode,name,categories,lat,lng,id,DistanceFromHolidayInn
0,R3C 1V1,Winnipeg Art Gallery,Art Gallery,49.889459,-97.150708,4b5f6471f964a520b4b729e3,0.2
32,R3L 0C8,Little Sister Coffee Maker,Coffee Shop,49.878993,-97.146772,5227ba6011d28632966c43a0,1.39


In [108]:
# merge manhattan_grouped with hi_nearby_venues to add latitude/longitude for each neighborhood
holidayinn_merged = holidayinn_merged.join(holidayinn_venues_sorted.set_index('Postal Code'), on='postalCode')

In [109]:
# View the first two rows of the new 'holidayinn_merged' to confirm that it contains the columns from both 'holidayinn_merged' and 'holidayinn_venues_sorted'
holidayinn_merged.head(2) # The eight last columns show the most common venues

Unnamed: 0,postalCode,name,categories,lat,lng,id,DistanceFromHolidayInn,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,R3C 1V1,Winnipeg Art Gallery,Art Gallery,49.889459,-97.150708,4b5f6471f964a520b4b729e3,0.2,0.0,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Restaurant,Theater
32,R3L 0C8,Little Sister Coffee Maker,Coffee Shop,49.878993,-97.146772,5227ba6011d28632966c43a0,1.39,1.0,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater


In [110]:
holidayinn_merged.isnull().head(2)

Unnamed: 0,postalCode,name,categories,lat,lng,id,DistanceFromHolidayInn,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
32,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [111]:
holidayinn_merged = holidayinn_merged.dropna()

In [112]:
holidayinn_merged.head(2)

Unnamed: 0,postalCode,name,categories,lat,lng,id,DistanceFromHolidayInn,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,R3C 1V1,Winnipeg Art Gallery,Art Gallery,49.889459,-97.150708,4b5f6471f964a520b4b729e3,0.2,0.0,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Restaurant,Theater
32,R3L 0C8,Little Sister Coffee Maker,Coffee Shop,49.878993,-97.146772,5227ba6011d28632966c43a0,1.39,1.0,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater


In [113]:
holidayinn_merged.isnull().head(2)

Unnamed: 0,postalCode,name,categories,lat,lng,id,DistanceFromHolidayInn,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
32,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [114]:
holidayinn_merged['Cluster Labels'] = holidayinn_merged['Cluster Labels'].astype(int)

#### Display Venue Clusters

In [115]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=16)

# add a red circle marker to represent the Holiday Inn
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='green',
    popup='Holiday Inn',
    fill = True,
    fill_color = 'yellow',
    fill_opacity = 0.6
).add_to(map_clusters)

# set color scheme for the clusters 
x = np.arange(kclusters) 
ys = [i + x + (i*x)**2 for i in range(kclusters)] 
colors_array = cm.rainbow(np.linspace(0, 1, len(ys))) 
rainbow = [colors.rgb2hex(i) for i in colors_array] 

# add markers to the map 
markers_colors = [] 
for lat, lon, poi, cluster in zip(holidayinn_merged['lat'], holidayinn_merged['lng'], holidayinn_merged['postalCode'], holidayinn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon], 
        radius=5, 
        popup=label, 
        color=rainbow[cluster-1], 
        fill=True, 
        fill_color=rainbow[cluster-1], 
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

## Examine Venues of Personal Interest per Cluster

##### Examine the clusters for personal interest venues and use the estimated distance of travel to plan the outings.

### Cluster 0

In [116]:
Cluster0 = holidayinn_merged.loc[holidayinn_merged['Cluster Labels'] == 0, holidayinn_merged.columns[[0] + [1] + [2] + list(range(5, holidayinn_merged.shape[1]))]]
Cluster0

Unnamed: 0,postalCode,name,categories,id,DistanceFromHolidayInn,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,R3C 1V1,Winnipeg Art Gallery,Art Gallery,4b5f6471f964a520b4b729e3,0.2,0,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Restaurant,Theater
6,R3B 2E4,Booster Juice,Juice Bar,4b8019a1f964a5202d5230e3,0.1,0,Juice Bar,Art Gallery,Coffee Shop,Gym,Pizza Place,Restaurant,Theater
68,R3B 0P8,The John Hirsch Theatre,Theater,4b46c037f964a520c62726e3,1.35,0,Theater,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Restaurant


In [117]:
# Determine the highest occuring venue in the Cluster Label, Cluster0
print('The family may visit the following common venues "{}" "{}" "{}" "{}" "{}" in this cluster.'.format(
    Cluster0.mode()['1st Most Common Venue'][0], Cluster0.mode()['2nd Most Common Venue'][0], 
    Cluster0.mode()['3rd Most Common Venue'][0], Cluster0.mode()['4th Most Common Venue'][0], 
    Cluster0.mode()['5th Most Common Venue'][0]))

The family may visit the following common venues "Art Gallery" "Art Gallery" "Coffee Shop" "Gym" "Pizza Place" in this cluster.


### Cluster 1

In [118]:
Cluster1 = holidayinn_merged.loc[holidayinn_merged['Cluster Labels'] == 1, holidayinn_merged.columns[[0] + [1] + [2] + list(range(5, holidayinn_merged.shape[1]))]]
Cluster1

Unnamed: 0,postalCode,name,categories,id,DistanceFromHolidayInn,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
32,R3L 0C8,Little Sister Coffee Maker,Coffee Shop,5227ba6011d28632966c43a0,1.39,1,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater
24,R3B 1B6,Parlour Coffee,Coffee Shop,4e5a6f8ad4c0ba8c11ab095d,1.15,1,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater
4,R3C 0L8,Starbucks,Coffee Shop,4b673b2af964a520b3422be3,0.25,1,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater
95,R3E 1J7,Starbucks,Coffee Shop,513b7dd6e4b0e75b720eed5f,1.43,1,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater
3,R3M 1Y1,Thom Bargen coffee & tea,Coffee Shop,56e0429fcd10ce07507635b3,0.24,1,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater
20,R3C 0C9,Tim Hortons,Coffee Shop,4b5d981ff964a520cb6229e3,0.15,1,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater
94,R3A 1G7,Tim Hortons,Coffee Shop,54e8a5d4498e9af95a77667c,1.41,1,Coffee Shop,Art Gallery,Gym,Juice Bar,Pizza Place,Restaurant,Theater


In [119]:
# Determine the highest occuring venue in the Cluster Label, Cluster1
print('The family may visit the following common venues "{}" "{}" "{}" "{}" "{}" in this cluster.'.format(
    Cluster1.mode()['1st Most Common Venue'][0], Cluster1.mode()['2nd Most Common Venue'][0], 
    Cluster1.mode()['3rd Most Common Venue'][0], Cluster1.mode()['4th Most Common Venue'][0], 
    Cluster1.mode()['5th Most Common Venue'][0]))

The family may visit the following common venues "Coffee Shop" "Art Gallery" "Gym" "Juice Bar" "Pizza Place" in this cluster.


### Cluster 2

In [120]:
Cluster2 = holidayinn_merged.loc[holidayinn_merged['Cluster Labels'] == 2, holidayinn_merged.columns[[0] + [1] + [2] + list(range(5, holidayinn_merged.shape[1]))]]
Cluster2

Unnamed: 0,postalCode,name,categories,id,DistanceFromHolidayInn,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
1,R3C 3L7,Harvey's,Restaurant,4edd1bb277c8274e005b3e73,0.41,2,Restaurant,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Theater
61,R3C 0R3,Prairie 360,Restaurant,5203fd442fc63a7e19f4f4cb,1.16,2,Restaurant,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Theater
13,R3B 1K6,deer + almond,Restaurant,4f5832dfe4b032fce5f1f4a1,0.97,2,Restaurant,Art Gallery,Coffee Shop,Gym,Juice Bar,Pizza Place,Theater


In [121]:
# Determine the highest occuring venue in the Cluster Label, Cluster2
print('The family may visit the following common venues "{}" "{}" "{}" "{}" "{}" in this cluster.'.format(
    Cluster2.mode()['1st Most Common Venue'][0], Cluster2.mode()['2nd Most Common Venue'][0], 
    Cluster2.mode()['3rd Most Common Venue'][0], Cluster2.mode()['4th Most Common Venue'][0], 
    Cluster2.mode()['5th Most Common Venue'][0]))

The family may visit the following common venues "Restaurant" "Art Gallery" "Coffee Shop" "Gym" "Juice Bar" in this cluster.


### Cluster 3

In [122]:
Cluster3 = holidayinn_merged.loc[holidayinn_merged['Cluster Labels'] == 3, holidayinn_merged.columns[[0] + [1] + [2] + list(range(5, holidayinn_merged.shape[1]))]]
Cluster3

Unnamed: 0,postalCode,name,categories,id,DistanceFromHolidayInn,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
90,R3L 1Y5‎,A Little Pizza Heaven,Pizza Place,4b867986f964a520b88b31e3,1.49,3,Pizza Place,Art Gallery,Coffee Shop,Gym,Juice Bar,Restaurant,Theater
26,R3C 0M6,Carbone Coal Fired Pizza (Downtown),Pizza Place,53973e30498eb27aa283b4dd,0.84,3,Pizza Place,Art Gallery,Coffee Shop,Gym,Juice Bar,Restaurant,Theater


In [123]:
# Determine the highest occuring venue in the Cluster Label, Cluster3
print('The family may visit the following common venues "{}" "{}" "{}" "{}" "{}" in this cluster.'.format(
    Cluster3.mode()['1st Most Common Venue'][0], Cluster3.mode()['2nd Most Common Venue'][0], 
    Cluster3.mode()['3rd Most Common Venue'][0], Cluster3.mode()['4th Most Common Venue'][0], 
    Cluster3.mode()['5th Most Common Venue'][0]))

The family may visit the following common venues "Pizza Place" "Art Gallery" "Coffee Shop" "Gym" "Juice Bar" in this cluster.


### Cluster 4

In [124]:
Cluster4 = holidayinn_merged.loc[holidayinn_merged['Cluster Labels'] == 4, holidayinn_merged.columns[[0] + [1] + [2] + list(range(5, holidayinn_merged.shape[1]))]]
Cluster4

Unnamed: 0,postalCode,name,categories,id,DistanceFromHolidayInn,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
10,R3B 2N7,YMCA-YWCA of Winnipeg,Gym,4afb42e7f964a520751c22e3,0.18,4,Gym,Art Gallery,Coffee Shop,Juice Bar,Pizza Place,Restaurant,Theater


In [125]:
# Determine the highest occuring venue in the Cluster Label, Cluster4
print('The family may visit the following common venues "{}" "{}" "{}" "{}" "{}" in this cluster.'.format(
    Cluster4.mode()['1st Most Common Venue'][0], Cluster4.mode()['2nd Most Common Venue'][0], 
    Cluster4.mode()['3rd Most Common Venue'][0], Cluster4.mode()['4th Most Common Venue'][0], 
    Cluster4.mode()['5th Most Common Venue'][0]))

The family may visit the following common venues "Gym" "Art Gallery" "Coffee Shop" "Juice Bar" "Pizza Place" in this cluster.


## Conclusion
The programme
- streamlined the process of searching for available venues at a particular location and produced a list of preferred venues with appropriate attributes.
- increased the ease and speed of getting required venue information. The runtime is less than one (1) minute, thus reducing the turnaround times for responding to requests to create a plan for visiting different holiday venues.
- made it easy to modify the itinerary for a vacation in the event of an unexpected occurrence because each parameter can be easily altered as required and the accurate results would be generated.

## Acknowledgement
Thanks to Foursquare for making the Foursquare API available to developers, and making the data of venues easy to assess and utilise.

Gratitude to IBM for the comprehensive training in Data Science and the making IBM Watson Studio accessible throughout this course.

Thanks to the Python community for making the journey to learn Python programming language easier.