## Introduction

Trip Scheduler

If I were going on a trip, as a first step, I would try to gather a list of tourist attractions in that city.  Then I would choose the ones that I was interested in.   For example, if I was interested in culture then I would plan on visiting Museums.  On the other hand, if I was interested in Art I would visit more Art Galleries in that city.  The most attractive places visit in that city, would vary from person to person.

Once the places of interest were identified, I would sort the list then I would plan on the details like which places can be visited on which day etc.  It would make sense to group places that are located near each other and visit them  together in the same day.   Geographic location of the places being visited would be used to make this possible.

The Trip Scheduler would help the tourist develop this kind of a plan.  The Tourist would enter the address of the city  and  the number of days he/she would be staying in the city.  Trip Scheduler would analyze this input and produces the following results:

1.  List of Tourist attractions that can be visited on each day of their stay.  

2.  City Map with the locations of the tourist attractions marked.   The places would be marked with colors and each color representing the day on which the places are visited. 

## Data

GEOCODERS data is used to identify the latitude and longitude of the City.  

The following FOURSQUARES data are needed by the tourist scheduler tool:

1.	List of venues in a specified a geographic location.

2.	Categories of each venue.  This is required since the tool is going to produce the result based on the user choice of categories.

3.	Location data (latitude and longitude), and rating information for each venue.  This is required to pick the top N places to visit and to categorize them based on the proximity to each other.
 
Foursquare data is the main dataset being used to get the above data.  

1.	Venues/Categories API:  This API provides the list of Venue Categories supported in the dataset.  The tourism related Categories are selected from the received response and displayed to the user.  The user can then select their interested categories from this list.  For example, assume the response from the dataset contains the following categories:  Restaurants, Shopping Malls, Museums, Sports Activities, Art Galleries, Parks and Palaces.  The tourist could choose, Museum, Art Galleries and Parks as interested categories.  The tool would select places only from these categories.

2.	Venues/Explore API:   This API provides the list of venues that matches the selected categories in the specific location.   The fields of interest are Venue name, Venue ID, Venue Category, Venue Latitude and Venue Longitude.

3.	Venues/Venue ID  API:  This API is used to get the details of each venue.  The fields of interest are ‘Rating’ and the ‘Likes’.  The Venues are then sorted based on this ‘Rating’ and ‘Likes’ values.  From this sorted list, the top N venues are picked up.

The top N venues would be segmented using the K-Means clustering algorithm.  The attributes used for clustering would be the latitude and longitude of the venue in order to get the tourist places segmented based on the geographical location.  Each segment represents the day on which the places are visited.

## Methodology

### Step 1: Import needed libraries, turn off warnings, load foursquare credentials

In [305]:
import requests
import pandas as pd
pd.set_option('display.max_rows', 500)

import warnings
warnings.filterwarnings('ignore') #Turn off warnings

#!conda install -c conda-forge geocoder -y
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import numpy as np

print('Libraries imported.')

Libraries imported.


In [323]:
CLIENT_ID = '2KLOE5IFU4U0FU4JEGOJBZ2SZS00KRF1SXES45FA5C3ISO5U' # your Foursquare ID
CLIENT_SECRET = 'XU5TRJWTXMNTAD4SPYKT1ID5IEN3M1454ULFGAMDWPGOH1QF' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 2KLOE5IFU4U0FU4JEGOJBZ2SZS00KRF1SXES45FA5C3ISO5U
CLIENT_SECRET:XU5TRJWTXMNTAD4SPYKT1ID5IEN3M1454ULFGAMDWPGOH1QF


### Step 2:  Input the location to visit (city), number of days, and categories

In [324]:
vlocation = 'Bentonville, AR'
vdays = 3 #Number of days
perday = 4 #number of venues per day
tourism_categories = ['Theme Park', 'Park', 'National Park', 'Botanical Garden', 'Museum', 'Palace', 'Temple', 
                      'Aquarium', 'Planetarium', 'Zoo', 'Monument / Landmark', 'Capitol Building', 'Spiritual Center']  # Only a limited list ftesting

### Step 3: Get the latitude and longitude of the location (Geocoders)

In [325]:
geolocator = Nominatim()
location = geolocator.geocode(vlocation)
lat = location.latitude
lng = location.longitude
print('The geograpical coordinate of ', vlocation, ' are {}, {}.'.format(lat, lng))

The geograpical coordinate of  Bentonville, AR  are 36.3728538, -94.2088172.


### Step 4: Get the list of supported venue categories (Foursquare)

In [326]:
cat_url = 'https://api.foursquare.com/v2/venues/categories?&client_id={}&client_secret={}&v={}'.format(CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION)
cat_result = requests.get(cat_url).json()

### Step 5: Get applicable tourism categories 

In [327]:
fs_tourism_cat_list = []
fs_tourism_cat_id = []
for c1 in cat_result['response']['categories']:
    if c1['name'] in tourism_categories:
        fs_tourism_cat_list.append([c1['name'],c1['id']])
        fs_tourism_cat_id.append(c1['id'])
    for c2 in c1['categories']:
        if c2['name'] in tourism_categories:
                fs_tourism_cat_list.append([c2['name'],c2['id']])
                fs_tourism_cat_id.append(c2['id'])
        if (len(c2['categories']) != 0):
            for c3 in c2['categories']:
                if c3['name'] in tourism_categories:
                    fs_tourism_cat_list.append([c3['name'],c3['id']])
                    fs_tourism_cat_id.append(c3['id'])

### Step 6: Get the list of Venues matching the tourism categories

In [328]:
# Parameters for the foursquare APIs
LIMIT=100
radius = 8000  # 80km

results = []
for index, cid in enumerate(fs_tourism_cat_id):
    url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            cid,
            radius, 
            LIMIT)
    results.append(requests.get(url).json())

### Step 7: Convert to Dataframe with columns, Venue Name, ID, Category, Latitude and Longitude.

In [329]:
venues_list=[]
for res in results:
    if (len(res['response'])  != 0):
        for v in res['response']['venues']:
            # return only relevant information for each nearby venue
            if (len(v['categories']) != 0):
                venues_list.append((
                    v['name'],         
                    v['location']['lat'], 
                    v['location']['lng'],
                    v['categories'][0]['name'],
                    v['id']))
                
travel_venues = pd.DataFrame(venues_list)
travel_venues.columns = ['Name', 'Latitude', 'Longtitude', 'Category','Ident']

### Step 8: Get the details of each travel venue
#### Not implemented because it requires multiple calls to the venue/venueID api and venue/venueID calls are in Sandbox account.

### Step 9: Add the columns 'Rating' and 'Likes' to the dataframe based on the values received from the Venue details
#### Not implemented because it requires multiple calls to the venue/venueID api and venue/venueID calls are in Sandbox account.

### Work around for step 8 and 9 to randomly Assign likes and rating

In [330]:
np.random.seed(12345) #For reproduceablitli
travel_venues['Rating'] = np.random.randint(4,11, size=len(travel_venues))
travel_venues['Likes'] = np.random.randint(6,11, size=len(travel_venues))

### Step 10: Sort the dataframe based on the Rating and Likes.  Pick up top N travel venues.

In [331]:
travel_venues_sorted = travel_venues.sort_values(['Rating', 'Likes'], ascending=[False, False]) 
N = vdays * perday  
top_travel_venues = travel_venues_sorted[:N] # venues to be covred in a day
top_travel_venues

Unnamed: 0,Name,Latitude,Longtitude,Category,Ident,Rating,Likes
12,Buckminster Fuller's Fly's Eye Dome,36.383526,-94.203559,Museum,598fd8ea3149b904bbe12033,10,10
18,The Land of Happy Monkeys,36.362743,-94.181044,Zoo,4db83cfdfa8c377d83ba4261,10,10
77,The Church of Jesus Christ of Latter-day Saints,36.358282,-94.142592,Church,4f731f857bebc1de47d22e45,10,10
86,Park Street Baptist Church,36.384324,-94.208324,Church,5aac43ad1f7440585774573f,10,10
15,Daisy Airgun Museum,36.355182,-94.119768,Museum,4c34e4f17cc0c9b6881ef49a,10,9
41,Foerster Park,36.342422,-94.120547,Park,4c607cc590b2c9b60bea3c22,10,9
103,S.O.D POD,36.360391,-94.182654,Shrine,50916d7d498e8c94e3d01bcf,10,9
27,Crystal Bridges Trail,36.378921,-94.205425,Trail,4eed206c775b3c580d25c2f3,10,8
71,First United Methodist Church Bentonville,36.372896,-94.210764,Church,4d585e75afe4b60c3cf84f61,10,8
89,oakley chapel,36.333685,-94.174427,Church,569fc05c498e2db2bfe9ac1a,10,8


### Step 11: Form a new dataframe by dropping everything except Lat and Long

In [332]:
#clustering_venues=top_travel_venues.loc[:,['Name','Category', 'Latitude','Longtitude']]
clustering_venues=top_travel_venues.loc[:,['Latitude','Longtitude']]
clustering_venues

Unnamed: 0,Latitude,Longtitude
12,36.383526,-94.203559
18,36.362743,-94.181044
77,36.358282,-94.142592
86,36.384324,-94.208324
15,36.355182,-94.119768
41,36.342422,-94.120547
103,36.360391,-94.182654
27,36.378921,-94.205425
71,36.372896,-94.210764
89,36.333685,-94.174427


### Step 12: Use K-Means clustering to segment venues based on the Latitude and Longitude values

In [334]:
# set number of clusters
kclusters = vdays

# run k-means clustering
kmeans = KMeans(init = "k-means++", n_clusters=kclusters, n_init = 12).fit(clustering_venues)


# check cluster labels generated for each row in the dataframe
kmeans.labels_

venues_grouped = top_travel_venues.loc[:,['Name', 'Category', 'Latitude','Longtitude']]

# add clustering labels
venues_grouped['Cluster Labels'] = kmeans.labels_
venues_grouped

Unnamed: 0,Name,Category,Latitude,Longtitude,Cluster Labels
12,Buckminster Fuller's Fly's Eye Dome,Museum,36.383526,-94.203559,0
18,The Land of Happy Monkeys,Zoo,36.362743,-94.181044,0
77,The Church of Jesus Christ of Latter-day Saints,Church,36.358282,-94.142592,1
86,Park Street Baptist Church,Church,36.384324,-94.208324,0
15,Daisy Airgun Museum,Museum,36.355182,-94.119768,1
41,Foerster Park,Park,36.342422,-94.120547,1
103,S.O.D POD,Shrine,36.360391,-94.182654,0
27,Crystal Bridges Trail,Trail,36.378921,-94.205425,0
71,First United Methodist Church Bentonville,Church,36.372896,-94.210764,0
89,oakley chapel,Church,36.333685,-94.174427,1


## Results

### Step 13: Display the list of venues (for each day for 3 days)

#### Day 1

In [335]:
venues_grouped.loc[venues_grouped['Cluster Labels'] == 0, ['Name', 'Category']]

Unnamed: 0,Name,Category
12,Buckminster Fuller's Fly's Eye Dome,Museum
18,The Land of Happy Monkeys,Zoo
86,Park Street Baptist Church,Church
103,S.O.D POD,Shrine
27,Crystal Bridges Trail,Trail
71,First United Methodist Church Bentonville,Church


#### Day 2

In [336]:
venues_grouped.loc[venues_grouped['Cluster Labels'] == 1, ['Name', 'Category']]

Unnamed: 0,Name,Category
77,The Church of Jesus Christ of Latter-day Saints,Church
15,Daisy Airgun Museum,Museum
41,Foerster Park,Park
89,oakley chapel,Church
104,Church,Church


#### Day 3

In [337]:
venues_grouped.loc[venues_grouped['Cluster Labels'] == 2, ['Name', 'Category']]

Unnamed: 0,Name,Category
57,Bentonville Bike Trail,Park


### Step 14: Use Folium to visualize the travel venues on the City map.¶ 

#### (Day 1 in Red , Day 2 in Purple, and Day 3 in Green)¶ 

In [338]:
# create map
map_clusters = folium.Map(location=[lat, lng], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(venues_grouped['Latitude'], venues_grouped['Longtitude'], venues_grouped['Name'], venues_grouped['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## DISCUSSION

The number of places to be covered on any day can be longer than the number we would like to visit because only latitude and longitude were used in the K-Means algorithm to do the clustering.  

Future imporvements could add options such as hours to visit at a venue, number of hours users spend on average at each venue,  etc, then the algorithm could do further spilts on this list.

## CONCLUSION

When someone is planning a trip to an unknown city, the planning becomes difficult due to the lack of knowledge about the city.  This tool can help.  The tool uses preference of venue categories, number of days stay, ratings of venues, and venue likes.  This helps the traveler by making it easy for the user to make a schedule without going to multiple web pages.
