# **CapStone Project - The Battle of neighbourhoods (Week2)**

# **Choosing a Resort for Stay**

## Table of Contents

* [1.Introduction: Business Problem](##1.Introduction)
* [2.Data](##2.Data)
* [3.Methodology](##3.Methodology)
* [4.Analysis](##4.Analysis)
* [5.Results and Discussion](##5.Results)
* [6.Conclusion](##6.Conclusion)

## 1. Introduction

In this project we will try to find an optimal resort of tourist choice in a tourist place named *Kodaikanal, TamilNadu* in India.

**Kodaikanal** is a hill town in the southern Indian state of Tamil Nadu. It’s set in an area of granite cliffs, forested valleys, lakes, waterfalls and grassy hills making it a tourister choice with its sceneric beauty. Since there are lots of resorts in Kodaikanal, we will try to detect locations that are optimal for travellers with various interests. Some tourists may like resorts closer to the town, some may like resorts closer to tourist spots and so on. We will provide the details of each location so the tourist can choose the resort of their choice.

## 2. Data

The factors that will influence our decision are:

 - Venues Close by the resort
 
 - Ratings on the Resort
 
 - Likes on the Resort

Following data sources will be needed to extract/generate the required information:

 - Resorts in kodaikanal are obtained using Foursquare API

 - The details of the resort like the Rating,Number of Likes are obtained using Foursquare premium API

 - Number of restaurants,tourist spots and their type and location near every Resort around 2.5 km radius will be obtained using Foursquare API


In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
# uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
# uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


In [4]:
address = 'Kodaikanal,Tamil Nadu'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Kodaikanal City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Kodaikanal City are 10.273275349999999, 77.51160822153315.


In [5]:
CLIENT_ID = 'R51UD0XCGDUSCMGV2AMLFHWZIJRBWWVAB0SELQWQ41QK5Z4Y' # your Foursquare ID
CLIENT_SECRET = 'JNOQ05S2XLH02TYZBXV3IL3DANX5CCJPSU41ZSSGHLN4NELV' # your Foursquare Secret
VERSION = '20210120' #fourquare API version
LIMIT = 100 # A default Foursquare API limit value
radius=30000

In [6]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(    
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude, 
            radius, 
            LIMIT,
            '4bf58dd8d48988d1fa931735')
                       
        # make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']
        

In [7]:
venues_list=[]
venues_list.append([(
            v['venue']['name'], 
            v['venue']['id'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

df_resorts = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
df_resorts.columns = ['Name','Venue ID','Latitude','Longitude','Category']

In [8]:
df_resorts

Unnamed: 0,Name,Venue ID,Latitude,Longitude,Category
0,Carlton Hotel,4c1601317f7f2d7f743be368,10.234211,77.488522,Hotel
1,The Kodaikanal Club,4c34be637cc0c9b6f9d9f39a,10.234079,77.491351,Resort
2,Hotel Kodai International,4d59fbd17e22370438f5b773,10.242568,77.498038,Hotel
3,Apple Valley Resort,50a20b69e4b02fa5344ce28c,10.236985,77.495837,Hotel
4,Hotel Cliffton,4d58dcde7e2237043617aa73,10.238823,77.484139,Hotel
5,Villa Retreat,4c164afda9c220a108605a9d,10.232191,77.494358,Hotel
6,Hotel Vel's Court,4ef095ae77c8053fbd0fc24a,10.441988,77.516294,Hotel
7,Sterling valley view resort,4dca3084b0fb9c8f8ae0172d,10.268499,77.490931,Resort
8,"Sterling Holidays, Kodai - Valley View",51b87edc7dd2c2c158dd08ee,10.268134,77.48889,Resort
9,GRT Nature Trails. Kodaikanal,5621aaa7498e25669673fd3e,10.27546,77.486388,Resort


In [9]:
# create map
map_resorts = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to the map

for lat, lon, poi in zip(df_resorts['Latitude'], df_resorts['Longitude'], df_resorts['Name']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.Marker(
        [lat, lon],
        popup=label).add_to(map_resorts)

map_resorts

## **3.Methodology**

In this project we are working towards helping a tourist to choose a correct resort of his taste for a stay in Kodaikanal.

To acheive this, we have taken the resorts available in Kodaikanal within 30 kms and added the venues near by those resorts within 3 kms vicinity. We have also added the Ratings and Likes on the Resort to help the tourister to have an idea on the resort based on others review.

With all these details we have grouped the resorts into clusters using K-means clustering methodology.


## **4.Analysis** ##

Adding the Rating and number of likes on the resorts using the Foursquare API. Analysing the resorts with the details of venues near by with the help of Foursquare API. Grouping the venues with the venue details, Rating and Likes

In [12]:
def getVenueDetails(names, latitudes, longitudes, venueid):
    
    venues_list=[]
    for name, lat, lng, venueid in zip(names, latitudes, longitudes,venueid):
        #print(names)
        venue_id = venueid 
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
            venue_id, 
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION)

        results = requests.get(url).json()#["response"]['venue']['rating']
        try:
            rate = results["response"]['venue']['rating']
        except KeyError:
            rate = '5'

        try:
            likes = results["response"]['venue']['likes']['count']
        except KeyError:
            likes = '5'
        try:
            dislikes = results["response"]['stats']['usersCount']
        except KeyError:
            dislikes = 'Unknown'
     
        venues_list.append([(
              name, 
            lat, 
            lng, 
            venueid,
            rate,
           likes)])
  
        df_new4 = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        df_new4.columns = ['Name','Latitude','Longitude','Venue id','Rating','Likes']
        
        
    return(df_new4)

In [13]:
df = getVenueDetails(names=df_resorts['Name'],
                                   latitudes=df_resorts['Latitude'],
                                   longitudes=df_resorts['Longitude'],
                                   venueid = df_resorts['Venue ID']
                                   )

In [14]:
url = 'https://api.foursquare.com/v2/venues/4e5df25318a8c46e082e725e?&client_id=R51UD0XCGDUSCMGV2AMLFHWZIJRBWWVAB0SELQWQ41QK5Z4Y&client_secret=JNOQ05S2XLH02TYZBXV3IL3DANX5CCJPSU41ZSSGHLN4NELV&v=20210106'

results = requests.get(url).json()
print(results)

{'meta': {'code': 200, 'requestId': '60083f9b24a0b71749f05772'}, 'response': {'venue': {'id': '4e5df25318a8c46e082e725e', 'name': 'Ayur County Resort', 'contact': {'phone': '+919447700017', 'formattedPhone': '+91 94477 00017'}, 'location': {'address': 'Kochi-Madurai-Tondi Point Rd', 'lat': 10.030769737898341, 'lng': 77.16109011195073, 'labeledLatLngs': [{'label': 'display', 'lat': 10.030769737898341, 'lng': 77.16109011195073}], 'postalCode': '685618', 'cc': 'IN', 'city': 'Munnar', 'state': 'Kerela', 'country': 'India', 'formattedAddress': ['Kochi-Madurai-Tondi Point Rd', 'Munnar 685618', 'Kerela', 'India']}, 'canonicalUrl': 'https://foursquare.com/v/ayur-county-resort/4e5df25318a8c46e082e725e', 'categories': [{'id': '4bf58dd8d48988d12f951735', 'name': 'Resort', 'pluralName': 'Resorts', 'shortName': 'Resort', 'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/resort_', 'suffix': '.png'}, 'primary': True}], 'verified': False, 'stats': {'tipCount': 1}, 'url': 'http://www.ay

In [16]:
#printing the dataframe with the Rating and Likes on the Resorts
df

Unnamed: 0,Name,Latitude,Longitude,Venue id,Rating,Likes
0,Carlton Hotel,10.234211,77.488522,4c1601317f7f2d7f743be368,7.2,11
1,The Kodaikanal Club,10.234079,77.491351,4c34be637cc0c9b6f9d9f39a,8.1,9
2,Hotel Kodai International,10.242568,77.498038,4d59fbd17e22370438f5b773,5.0,4
3,Apple Valley Resort,10.236985,77.495837,50a20b69e4b02fa5344ce28c,5.0,2
4,Hotel Cliffton,10.238823,77.484139,4d58dcde7e2237043617aa73,5.0,3
5,Villa Retreat,10.232191,77.494358,4c164afda9c220a108605a9d,5.0,4
6,Hotel Vel's Court,10.441988,77.516294,4ef095ae77c8053fbd0fc24a,5.0,2
7,Sterling valley view resort,10.268499,77.490931,4dca3084b0fb9c8f8ae0172d,5.8,3
8,"Sterling Holidays, Kodai - Valley View",10.268134,77.48889,51b87edc7dd2c2c158dd08ee,5.0,1
9,GRT Nature Trails. Kodaikanal,10.27546,77.486388,5621aaa7498e25669673fd3e,5.0,3


In [17]:
def getNearbyVenues(names, latitudes, longitudes,rating,likes, radius=2000):
    
    venues_list=[]
    for name, lat, lng,rating,likes in zip(names, latitudes, longitudes,rating,likes):
                   
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            rating,
           likes,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Name', 
                  'Name Latitude', 
                  'Name Longitude', 
                  'Rating',
                  'Likes',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
df_new4 = getNearbyVenues(names=df['Name'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'],
                                   rating=df['Rating'],
                                   likes=df['Likes'])

In [19]:
#printing the dataframe with the nearby veneues of the Resorts
df_new4

Unnamed: 0,Name,Name Latitude,Name Longitude,Rating,Likes,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Carlton Hotel,10.234211,77.488522,7.2,11,The Kodaikanal Club,10.234079,77.491351,Resort
1,Carlton Hotel,10.234211,77.488522,7.2,11,Pine Forest,10.232944,77.491317,Forest
2,Carlton Hotel,10.234211,77.488522,7.2,11,Tava,10.235285,77.490924,Indian Restaurant
3,Carlton Hotel,10.234211,77.488522,7.2,11,Café Coffee Day,10.23456,77.491578,Café
4,Carlton Hotel,10.234211,77.488522,7.2,11,Coaker's Walk,10.232254,77.493401,Trail
5,Carlton Hotel,10.234211,77.488522,7.2,11,Bryant Park,10.231525,77.491771,Park
6,Carlton Hotel,10.234211,77.488522,7.2,11,The Tredis Tea Room,10.238722,77.487912,Café
7,Carlton Hotel,10.234211,77.488522,7.2,11,Carlton Hotel,10.234211,77.488522,Hotel
8,Carlton Hotel,10.234211,77.488522,7.2,11,Domino's Pizza,10.23485,77.491328,Pizza Place
9,Carlton Hotel,10.234211,77.488522,7.2,11,Cloud Street,10.236801,77.491045,Café


In [20]:
df_new4.groupby('Name').count()

Unnamed: 0_level_0,Name Latitude,Name Longitude,Rating,Likes,Venue,Venue Latitude,Venue Longitude,Venue Category
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Apple Valley Resort,25,25,25,25,25,25,25,25
Carlton Hotel,26,26,26,26,26,26,26,26
GRT Nature Trails. Kodaikanal,4,4,4,4,4,4,4,4
"Hill Country, Kodaikanal",27,27,27,27,27,27,27,27
Hotel Cliffton,25,25,25,25,25,25,25,25
Hotel Kodai International,25,25,25,25,25,25,25,25
Hotel Vel's Court,5,5,5,5,5,5,5,5
Le Poshe by Sparsa,26,26,26,26,26,26,26,26
"Sterling Holidays, Kodai - By The Lake",26,26,26,26,26,26,26,26
"Sterling Holidays, Kodai - Valley View",4,4,4,4,4,4,4,4


In [21]:
# one hot encoding
kodai_onehot = pd.get_dummies(df_new4[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kodai_onehot['Name'] = df_new4['Name'] 
kodai_onehot['Rating'] = df_new4['Rating'].astype(float)
kodai_onehot['Likes'] = df_new4['Likes'] 
kodai_onehot['Latitude'] = df_new4['Name Latitude'] 
kodai_onehot['Longitude'] = df_new4['Name Longitude'] 

kodai_onehot.head()

Unnamed: 0,Bakery,Boat Rental,Bus Station,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,Hotel,Indian Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Mountain,Park,Pizza Place,Resort,Rest Area,Restaurant,Scenic Lookout,South Indian Restaurant,Trail,Train Station,Name,Rating,Likes,Latitude,Longitude
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,Carlton Hotel,7.2,11,10.234211,77.488522
1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Carlton Hotel,7.2,11,10.234211,77.488522
2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,Carlton Hotel,7.2,11,10.234211,77.488522
3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Carlton Hotel,7.2,11,10.234211,77.488522
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,Carlton Hotel,7.2,11,10.234211,77.488522


In [22]:
# move neighborhood column to the first column
fixed_columns = [kodai_onehot.columns[-1]] + list(kodai_onehot.columns[:-1])
kodai_onehot = kodai_onehot[fixed_columns]
fixed_columns = [kodai_onehot.columns[-1]] + list(kodai_onehot.columns[:-1])
kodai_onehot = kodai_onehot[fixed_columns]
fixed_columns = [kodai_onehot.columns[-1]] + list(kodai_onehot.columns[:-1])
kodai_onehot = kodai_onehot[fixed_columns]
fixed_columns = [kodai_onehot.columns[-1]] + list(kodai_onehot.columns[:-1])
kodai_onehot = kodai_onehot[fixed_columns]
fixed_columns = [kodai_onehot.columns[-1]] + list(kodai_onehot.columns[:-1])
kodai_onehot = kodai_onehot[fixed_columns]

kodai_onehot.head()

Unnamed: 0,Name,Rating,Likes,Latitude,Longitude,Bakery,Boat Rental,Bus Station,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,Hotel,Indian Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Mountain,Park,Pizza Place,Resort,Rest Area,Restaurant,Scenic Lookout,South Indian Restaurant,Trail,Train Station
0,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
1,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0


In [23]:
kodai_onehot

Unnamed: 0,Name,Rating,Likes,Latitude,Longitude,Bakery,Boat Rental,Bus Station,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,Hotel,Indian Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Mountain,Park,Pizza Place,Resort,Rest Area,Restaurant,Scenic Lookout,South Indian Restaurant,Trail,Train Station
0,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
1,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
5,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
6,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
9,Carlton Hotel,7.2,11,10.234211,77.488522,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [24]:
kodai_grouped = kodai_onehot.groupby('Name').mean().reset_index()
kodai_grouped

Unnamed: 0,Name,Rating,Likes,Latitude,Longitude,Bakery,Boat Rental,Bus Station,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,Hotel,Indian Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Mountain,Park,Pizza Place,Resort,Rest Area,Restaurant,Scenic Lookout,South Indian Restaurant,Trail,Train Station
0,Apple Valley Resort,5.0,2,10.236985,77.495837,0.04,0.0,0.0,0.12,0.04,0.04,0.04,0.04,0.12,0.12,0.0,0.04,0.0,0.08,0.04,0.08,0.04,0.04,0.04,0.04,0.04,0.0
1,Carlton Hotel,7.2,11,10.234211,77.488522,0.038462,0.038462,0.0,0.115385,0.038462,0.038462,0.038462,0.038462,0.115385,0.115385,0.0,0.038462,0.0,0.038462,0.038462,0.076923,0.038462,0.038462,0.038462,0.038462,0.076923,0.0
2,GRT Nature Trails. Kodaikanal,5.0,3,10.27546,77.486388,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Hill Country, Kodaikanal",5.0,0,10.234287,77.496142,0.037037,0.0,0.0,0.111111,0.037037,0.037037,0.0,0.037037,0.111111,0.148148,0.0,0.0,0.0,0.074074,0.037037,0.148148,0.037037,0.074074,0.037037,0.037037,0.037037,0.0
4,Hotel Cliffton,5.0,3,10.238823,77.484139,0.04,0.04,0.0,0.12,0.04,0.04,0.04,0.04,0.12,0.12,0.0,0.04,0.0,0.04,0.04,0.08,0.04,0.04,0.04,0.04,0.04,0.0
5,Hotel Kodai International,5.0,4,10.242568,77.498038,0.04,0.0,0.0,0.12,0.04,0.04,0.0,0.04,0.12,0.16,0.0,0.0,0.0,0.08,0.04,0.08,0.04,0.08,0.04,0.04,0.04,0.0
6,Hotel Vel's Court,5.0,2,10.441988,77.516294,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4
7,Le Poshe by Sparsa,5.0,4,10.244396,77.495635,0.038462,0.038462,0.0,0.115385,0.038462,0.038462,0.0,0.038462,0.115385,0.153846,0.0,0.0,0.0,0.076923,0.038462,0.076923,0.038462,0.076923,0.038462,0.038462,0.038462,0.0
8,"Sterling Holidays, Kodai - By The Lake",5.0,2,10.23043,77.480737,0.038462,0.038462,0.0,0.115385,0.038462,0.038462,0.038462,0.038462,0.076923,0.115385,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.076923,0.0
9,"Sterling Holidays, Kodai - Valley View",5.0,1,10.268134,77.48889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[5:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
#start clustering
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
kodai_grouped_sorted = pd.DataFrame(columns=columns)
kodai_grouped_sorted['Name'] = kodai_grouped['Name']


for ind in np.arange(kodai_grouped.shape[0]):
    kodai_grouped_sorted.iloc[ind, 1:] = return_most_common_venues(kodai_grouped.iloc[ind, :], num_top_venues)
    
kodai_grouped_sorted['Likes'] = kodai_grouped['Likes']
kodai_grouped_sorted['Rating'] = kodai_grouped['Rating']
kodai_grouped_sorted['Latitude'] = kodai_grouped['Latitude']
kodai_grouped_sorted['Longitude'] = kodai_grouped['Longitude']
kodai_grouped_sorted.head()

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Likes,Rating,Latitude,Longitude
0,Apple Valley Resort,Café,Hotel,Indian Restaurant,Resort,Park,Trail,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,2,5.0,10.236985,77.495837
1,Carlton Hotel,Café,Hotel,Indian Restaurant,Resort,Trail,Modern European Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,11,7.2,10.234211,77.488522
2,GRT Nature Trails. Kodaikanal,Resort,Train Station,Indian Restaurant,Boat Rental,Bus Station,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,3,5.0,10.27546,77.486388
3,"Hill Country, Kodaikanal",Resort,Indian Restaurant,Café,Hotel,Park,Restaurant,Trail,Forest,Bakery,Dim Sum Restaurant,0,5.0,10.234287,77.496142
4,Hotel Cliffton,Café,Hotel,Indian Restaurant,Resort,Modern European Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,Trail,3,5.0,10.238823,77.484139


In [27]:
# set number of clusters
kclusters = 3

kodai_grouped_clustering = kodai_grouped.drop('Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kodai_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 1, 2, 1, 1, 2, 1, 2, 2])

In [28]:
# add clustering labels
kodai_grouped_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
kodai_grouped_sorted.sort_values(by=['Cluster Labels'], inplace=True)
kodai_grouped_sorted

Unnamed: 0,Cluster Labels,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Likes,Rating,Latitude,Longitude
1,0,Carlton Hotel,Café,Hotel,Indian Restaurant,Resort,Trail,Modern European Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,11,7.2,10.234211,77.488522
12,0,The Kodaikanal Club,Café,Hotel,Indian Restaurant,Resort,Trail,Modern European Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,9,8.1,10.234079,77.491351
2,1,GRT Nature Trails. Kodaikanal,Resort,Train Station,Indian Restaurant,Boat Rental,Bus Station,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,3,5.0,10.27546,77.486388
4,1,Hotel Cliffton,Café,Hotel,Indian Restaurant,Resort,Modern European Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,Trail,3,5.0,10.238823,77.484139
5,1,Hotel Kodai International,Indian Restaurant,Café,Hotel,Park,Restaurant,Resort,Trail,Forest,Bakery,Dim Sum Restaurant,4,5.0,10.242568,77.498038
7,1,Le Poshe by Sparsa,Indian Restaurant,Café,Hotel,Park,Restaurant,Resort,Boat Rental,Forest,Trail,Bakery,4,5.0,10.244396,77.495635
10,1,Sterling valley view resort,Resort,Train Station,Indian Restaurant,Boat Rental,Bus Station,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,3,5.8,10.268499,77.490931
11,1,Sterling's Kodai By The Lake Resort,Café,Hotel,Indian Restaurant,Trail,Middle Eastern Restaurant,Boat Rental,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,3,5.0,10.230078,77.480821
13,1,Villa Retreat,Hotel,Indian Restaurant,Café,Resort,Trail,Bakery,Modern European Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,4,5.0,10.232191,77.494358
0,2,Apple Valley Resort,Café,Hotel,Indian Restaurant,Resort,Park,Trail,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,2,5.0,10.236985,77.495837


In [29]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kodai_grouped_sorted['Latitude'], kodai_grouped_sorted['Longitude'], kodai_grouped_sorted['Name'], kodai_grouped_sorted['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.5).add_to(map_clusters)
    
    
map_clusters

In [30]:
kodai_grouped_sorted.loc[kodai_grouped_sorted['Cluster Labels'] == 0, kodai_grouped_sorted.columns[[1] + list(range(2, kodai_grouped_sorted.shape[1]))]]

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Likes,Rating,Latitude,Longitude
1,Carlton Hotel,Café,Hotel,Indian Restaurant,Resort,Trail,Modern European Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,11,7.2,10.234211,77.488522
12,The Kodaikanal Club,Café,Hotel,Indian Restaurant,Resort,Trail,Modern European Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,9,8.1,10.234079,77.491351


In [31]:
kodai_grouped_sorted.loc[kodai_grouped_sorted['Cluster Labels'] == 1, kodai_grouped_sorted.columns[[1] + list(range(2, kodai_grouped_sorted.shape[1]))]]

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Likes,Rating,Latitude,Longitude
2,GRT Nature Trails. Kodaikanal,Resort,Train Station,Indian Restaurant,Boat Rental,Bus Station,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,3,5.0,10.27546,77.486388
4,Hotel Cliffton,Café,Hotel,Indian Restaurant,Resort,Modern European Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,Trail,3,5.0,10.238823,77.484139
5,Hotel Kodai International,Indian Restaurant,Café,Hotel,Park,Restaurant,Resort,Trail,Forest,Bakery,Dim Sum Restaurant,4,5.0,10.242568,77.498038
7,Le Poshe by Sparsa,Indian Restaurant,Café,Hotel,Park,Restaurant,Resort,Boat Rental,Forest,Trail,Bakery,4,5.0,10.244396,77.495635
10,Sterling valley view resort,Resort,Train Station,Indian Restaurant,Boat Rental,Bus Station,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,3,5.8,10.268499,77.490931
11,Sterling's Kodai By The Lake Resort,Café,Hotel,Indian Restaurant,Trail,Middle Eastern Restaurant,Boat Rental,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,3,5.0,10.230078,77.480821
13,Villa Retreat,Hotel,Indian Restaurant,Café,Resort,Trail,Bakery,Modern European Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,4,5.0,10.232191,77.494358


In [32]:
kodai_grouped_sorted.loc[kodai_grouped_sorted['Cluster Labels'] == 2, kodai_grouped_sorted.columns[[1] + list(range(2, kodai_grouped_sorted.shape[1]))]]

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Likes,Rating,Latitude,Longitude
0,Apple Valley Resort,Café,Hotel,Indian Restaurant,Resort,Park,Trail,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,2,5.0,10.236985,77.495837
3,"Hill Country, Kodaikanal",Resort,Indian Restaurant,Café,Hotel,Park,Restaurant,Trail,Forest,Bakery,Dim Sum Restaurant,0,5.0,10.234287,77.496142
6,Hotel Vel's Court,Train Station,Bus Station,Hotel,Indian Restaurant,Trail,Boat Rental,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,2,5.0,10.441988,77.516294
8,"Sterling Holidays, Kodai - By The Lake",Café,Indian Restaurant,Trail,Hotel,Middle Eastern Restaurant,Boat Rental,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,2,5.0,10.23043,77.480737
9,"Sterling Holidays, Kodai - Valley View",Resort,Train Station,Indian Restaurant,Boat Rental,Bus Station,Café,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Forest,1,5.0,10.268134,77.48889


## **5.Results and Discussion** ##

Based on the clustering results we have the following observation
Cluster 0 - Has resorts that has number of likes greater than 5 and rating greater than 5 and few restaurants nearby. We have 2 resorts in this cluster
Cluster 1 - Has resorts that has number of likes lesser than 5 and greaterthan 3 with a mix of nearby venues like park,boat rental,restuarants and forest. We have 7 resorts in this cluster
Cluster 2 - Has resorts that has number of likes lesser than 5 with a mix of nearby venues like park,boat rental,restuarants and forest. we have 5 resorts in this cluster


## **6.Conclusion** ##

Tourists who likes to be in resorts with greater number of likes and that are not close to any other tourist venues can choose a resort from cluster 0. Tourists who likes to be in resorts near to tourist venues with greater number of likes can choose a resort in cluster 1 and Tourists who likes to be in resorts near to tourist venues and does not worry about number of likes can choose a resort in cluster 2.