### IBM Applied Data Science Capstone Course by Coursera
#### Week 5 Final Report
#### Identifying & Recommending the locations to open a new Hotel or Resort in Visakhapatnam, Andhra Pradesh, India

###### 1. From Wikipedia pages related to neighbourhoods of Visakhapatnam, India build the master data frame using webscrapping  
###### 2. Get the geographical coordinates of the neighborhoods of Visakhapatnam
###### 3. By using Foursquare API get the hotels and resorts data for the neighbourhoods of Visakhapatnam. 
###### 4. Conduct K-means clustering and explore/analyze each of the clusters
###### 5. Recommend the best locations to start a hotel or a resort in and around Visakhapatnam city.

#### 1. Import libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


#### 2. Scrap data from Wikipedia page into a DataFrame using ULR:"https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Visakhapatnam"

In [4]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Visakhapatnam").text

# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

# create a list to store neighborhood data
neighborhoodList = []

# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)
    
# create a new DataFrame from the list
vskp_df = pd.DataFrame({"Neighborhood": neighborhoodList})

print(vskp_df.head())

# print the number of rows of the dataframe
vskp_df.shape

   Neighborhood
0     Abidnagar
1  Adarsh Nagar
2    Adavivaram
3    Aganampudi
4  Akkayyapalem


(121, 1)

#### 3. Get the geographical coordinates (Latitue and Logitute) for all the neighbourhoods

In [6]:
neh = []
lat = []
lon = []
for n in vskp_df['Neighborhood']:
    g=geocoder.arcgis(n,timeout=None)
    neh.append(n)
    lon.append(g.x)
    lat.append(g.y) 
#     print(lat)
#     print(lon)

In [7]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame({'Latitude' : lat, 'Longitude' : lon})

In [8]:
# merge the coordinates into the original dataframe
vskp_df['Latitude'] = df_coords['Latitude']
vskp_df['Longitude'] = df_coords['Longitude']

In [9]:
# check the neighborhoods and the coordinates
print(vskp_df.shape)
vskp_df

(121, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Abidnagar,17.73786,83.29888
1,Adarsh Nagar,23.26888,77.40537
2,Adavivaram,17.78583,83.25242
3,Aganampudi,17.68904,83.13988
4,Akkayyapalem,17.73421,83.29713
5,Akkireddypalem,17.70872,83.20904
6,Allipuram,14.48821,80.04214
7,Anakapalle,17.68984,83.00175
8,Anandapuram,17.87772,83.30459
9,Appikonda,17.59616,83.20241


In [11]:
# save the DataFrame as CSV file
vskp_df.to_csv("Vizag_df.csv",index = False)

#### 4. Create a map of Visakhapatnam city with neighborhoods 

In [12]:
# get the coordinates of Visakhapatnam
address = 'Visakhapatnam, India'

geolocator = Nominatim(user_agent="my-application",timeout=None)
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Visakhapatnam, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Visakhapatnam, India 17.89937045, 82.5642200595236.


In [13]:
# create map of Visakhapatnam using latitude and longitude values
map_vskp = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(vskp_df['Latitude'], vskp_df['Longitude'], vskp_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_vskp)  
    
map_vskp

In [15]:
# save the map as HTML file
map_vskp.save('map_vskp.html')

#### 5. Use the Foursquare API to explore the neighborhoods

In [16]:
# define Foursquare Credentials and Version
CLIENT_ID = 'SESVPXUWPGG5GDPMQFKSIJDPGU2WT3BAHTWWWSOF5NXJ2MMX' # your Foursquare ID
CLIENT_SECRET = 'ADCR5BSXNRR1ZAUB5FCMC3QFE2GW13RD5K1GYQWYORDXEO0N' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: SESVPXUWPGG5GDPMQFKSIJDPGU2WT3BAHTWWWSOF5NXJ2MMX
CLIENT_SECRET:ADCR5BSXNRR1ZAUB5FCMC3QFE2GW13RD5K1GYQWYORDXEO0N


##### Now, let's get the top 100 venues that are within a radius of 5000 meters.

In [18]:
radius = 5000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(vskp_df['Latitude'], vskp_df['Longitude'], vskp_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [19]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(3596, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abidnagar,17.73786,83.29888,Food Ex,17.722155,83.318422,Fast Food Restaurant
1,Abidnagar,17.73786,83.29888,Cream & Fudge,17.719339,83.311927,Ice Cream Shop
2,Abidnagar,17.73786,83.29888,Sai Ram Parlour,17.726339,83.303465,Indian Restaurant
3,Abidnagar,17.73786,83.29888,Pastry Coffee & Conversation,17.724092,83.317831,Café
4,Abidnagar,17.73786,83.29888,Waltair Club,17.72058,83.316784,Restaurant


#### Let's check how many venues were returned for each neighorhood

In [20]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abidnagar,78,78,78,78,78,78
Adarsh Nagar,34,34,34,34,34,34
Adavivaram,5,5,5,5,5,5
Aganampudi,4,4,4,4,4,4
Akkayyapalem,73,73,73,73,73,73
Akkireddypalem,10,10,10,10,10,10
Allipuram,1,1,1,1,1,1
Anakapalle,4,4,4,4,4,4
Anandapuram,2,2,2,2,2,2
Appikonda,5,5,5,5,5,5


#### Let's find out how many unique categories can be curated from all the returned venues

In [21]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 126 uniques categories.


In [22]:
# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Fast Food Restaurant', 'Ice Cream Shop', 'Indian Restaurant',
       'Café', 'Restaurant', 'Beach', 'Snack Place', 'Hotel', 'Multiplex',
       'Indie Movie Theater', 'Fabric Shop', 'Food Court', 'Pizza Place',
       'Steakhouse', 'Juice Bar', 'Italian Restaurant',
       'Multicuisine Indian Restaurant', 'Breakfast Spot', 'Coffee Shop',
       'Bookstore', 'Resort', 'Clothing Store', 'Shopping Mall',
       'Science Museum', 'Dessert Shop', 'Train Station', 'Bakery',
       'Platform', 'Bus Station', 'Department Store', 'Track Stadium',
       'Smoke Shop', 'Grocery Store', 'Salad Place', 'Gift Shop',
       'Lounge', 'Park', 'Pub', 'Cafeteria', 'Trail', 'Gastropub',
       'Harbor / Marina', 'Tea Room', 'Market', 'Asian Restaurant',
       'Diner', 'Burger Joint', 'Basketball Stadium',
       'Chinese Restaurant', 'Rest Area', 'Garden Center',
       'Mattress Store', 'Hockey Arena', 'Volleyball Court',
       'Golf Course', 'Athletics & Sports', 'Pier', 'Food Truck', 'ATM',

In [23]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

#### 6. Analyze Each Neighborhood

In [24]:
# one hot encoding
vskp_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vskp_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [vskp_onehot.columns[-1]] + list(vskp_onehot.columns[:-1])
vskp_onehot = vskp_onehot[fixed_columns]

print(vskp_onehot.shape)
vskp_onehot.head()

(3596, 127)


Unnamed: 0,Neighborhoods,ATM,Airport,American Restaurant,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Basketball Stadium,Beach,Beach Bar,Beer Garden,Boat or Ferry,Bookstore,Breakfast Spot,Brewery,Burger Joint,Bus Station,Cafeteria,Café,Campground,Chinese Restaurant,Church,Clothing Store,Coffee Shop,College Cafeteria,Convenience Store,Convention Center,Cricket Ground,Department Store,Dessert Shop,Dhaba,Diner,Dive Bar,Donut Shop,Electronics Store,Fabric Shop,Farm,Farmers Market,Fast Food Restaurant,Food Court,Food Truck,French Restaurant,Furniture / Home Store,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Grocery Store,Gym,Harbor / Marina,Hockey Arena,Home Service,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Lake,Lounge,Malay Restaurant,Market,Massage Studio,Mattress Store,Molecular Gastronomy Restaurant,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Neighborhood,New American Restaurant,Night Market,Nightclub,Noodle House,Outdoors & Recreation,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pub,Rajasthani Restaurant,Resort,Rest Area,Restaurant,River,Road,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Shopping Mall,Smoke Shop,Snack Place,Soup Place,South Indian Restaurant,Spa,Steakhouse,Supermarket,Tapas Restaurant,Tea Room,Thai Restaurant,Theme Park Ride / Attraction,Tourist Information Center,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Volleyball Court,Whisky Bar,Women's Store
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [25]:
vskp_grouped = vskp_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(vskp_grouped.shape)
vskp_grouped

(111, 127)


Unnamed: 0,Neighborhoods,ATM,Airport,American Restaurant,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Basketball Stadium,Beach,Beach Bar,Beer Garden,Boat or Ferry,Bookstore,Breakfast Spot,Brewery,Burger Joint,Bus Station,Cafeteria,Café,Campground,Chinese Restaurant,Church,Clothing Store,Coffee Shop,College Cafeteria,Convenience Store,Convention Center,Cricket Ground,Department Store,Dessert Shop,Dhaba,Diner,Dive Bar,Donut Shop,Electronics Store,Fabric Shop,Farm,Farmers Market,Fast Food Restaurant,Food Court,Food Truck,French Restaurant,Furniture / Home Store,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Grocery Store,Gym,Harbor / Marina,Hockey Arena,Home Service,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Lake,Lounge,Malay Restaurant,Market,Massage Studio,Mattress Store,Molecular Gastronomy Restaurant,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Neighborhood,New American Restaurant,Night Market,Nightclub,Noodle House,Outdoors & Recreation,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pub,Rajasthani Restaurant,Resort,Rest Area,Restaurant,River,Road,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Shopping Mall,Smoke Shop,Snack Place,Soup Place,South Indian Restaurant,Spa,Steakhouse,Supermarket,Tapas Restaurant,Tea Room,Thai Restaurant,Theme Park Ride / Attraction,Tourist Information Center,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Volleyball Court,Whisky Bar,Women's Store
0,Abidnagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.012821,0.012821,0.0,0.0,0.012821,0.012821,0.102564,0.0,0.0,0.0,0.012821,0.012821,0.0,0.0,0.0,0.0,0.025641,0.012821,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.012821,0.025641,0.0,0.0,0.0,0.0,0.0,0.012821,0.012821,0.0,0.012821,0.0,0.012821,0.0,0.0,0.064103,0.0,0.051282,0.128205,0.025641,0.0,0.012821,0.0,0.0,0.012821,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.025641,0.012821,0.0,0.0,0.012821,0.0,0.012821,0.0,0.038462,0.0,0.0,0.012821,0.0,0.0,0.012821,0.0,0.012821,0.012821,0.038462,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.012821,0.025641,0.0,0.0,0.0,0.0,0.0
1,Adarsh Nagar,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.058824,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.029412,0.0,0.029412,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.117647,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0
2,Adavivaram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aganampudi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
4,Akkayyapalem,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.013699,0.0,0.0,0.0,0.027397,0.0,0.0,0.0,0.013699,0.013699,0.0,0.0,0.013699,0.013699,0.09589,0.0,0.0,0.0,0.013699,0.013699,0.0,0.0,0.0,0.0,0.013699,0.013699,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.013699,0.027397,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.013699,0.0,0.0,0.013699,0.0,0.0,0.082192,0.0,0.054795,0.082192,0.027397,0.0,0.013699,0.0,0.0,0.013699,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.041096,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027397,0.0,0.0,0.013699,0.027397,0.013699,0.0,0.0,0.013699,0.0,0.013699,0.0,0.041096,0.0,0.0,0.013699,0.0,0.0,0.013699,0.0,0.013699,0.0,0.041096,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.027397,0.0,0.0,0.013699,0.0,0.0
5,Akkireddypalem,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0
6,Allipuram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Anakapalle,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
8,Anandapuram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Appikonda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.6,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
len(vskp_grouped[vskp_grouped[["Hotel","Resort"]] > 0])

111

#### Create a new DataFrame for Restaurant or Resort data only

In [44]:
vskp_hotel = vskp_grouped[["Neighborhoods","Hotel","Resort"]]

In [45]:
vskp_hotel.head()

Unnamed: 0,Neighborhoods,Hotel,Resort
0,Abidnagar,0.064103,0.012821
1,Adarsh Nagar,0.058824,0.029412
2,Adavivaram,0.0,0.0
3,Aganampudi,0.0,0.0
4,Akkayyapalem,0.082192,0.013699


#### 7. Cluster Neighborhoods

In [46]:
# set number of clusters
kclusters = 3

vskp_clustering = vskp_hotel.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vskp_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 1, 0, 0, 1, 0, 1, 1])

In [47]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
vskp_merged = vskp_hotel.copy()

# add clustering labels
vskp_merged["Cluster Labels"] = kmeans.labels_

In [48]:
vskp_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
vskp_merged.head()

Unnamed: 0,Neighborhood,Hotel,Resort,Cluster Labels
0,Abidnagar,0.064103,0.012821,0
1,Adarsh Nagar,0.058824,0.029412,0
2,Adavivaram,0.0,0.0,1
3,Aganampudi,0.0,0.0,1
4,Akkayyapalem,0.082192,0.013699,0


In [49]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
vskp_merged = vskp_merged.join(vskp_df.set_index("Neighborhood"), on="Neighborhood")

print(vskp_merged.shape)
vskp_merged.head() # check the last columns!

(111, 6)


Unnamed: 0,Neighborhood,Hotel,Resort,Cluster Labels,Latitude,Longitude
0,Abidnagar,0.064103,0.012821,0,17.73786,83.29888
1,Adarsh Nagar,0.058824,0.029412,0,23.26888,77.40537
2,Adavivaram,0.0,0.0,1,17.78583,83.25242
3,Aganampudi,0.0,0.0,1,17.68904,83.13988
4,Akkayyapalem,0.082192,0.013699,0,17.73421,83.29713


In [50]:
# sort the results by Cluster Labels
print(vskp_merged.shape)
vskp_merged.sort_values(["Cluster Labels"], inplace=True)
vskp_merged

(111, 6)


Unnamed: 0,Neighborhood,Hotel,Resort,Cluster Labels,Latitude,Longitude
0,Abidnagar,0.064103,0.012821,0,17.73786,83.29888
77,Railway New Colony,0.073529,0.014706,0,17.72862,83.29206
76,Prakashraopeta,0.086957,0.014493,0,17.71718,83.30575
73,Poorna Market,0.087719,0.017544,0,17.70682,83.29815
72,Pithapuram Colony,0.072464,0.014493,0,17.73563,83.32231
70,Pedagantyada,0.166667,0.0,0,17.6668,83.2104
69,Peda Waltair,0.069444,0.013889,0,17.73333,83.33333
68,"Pandurangapuram, Visakhapatnam",0.069444,0.013889,0,17.71793,83.32849
66,One Town (Visakhapatnam),0.129032,0.0,0,17.71984,83.26278
64,Nathayyapalem,0.111111,0.0,0,17.71099,83.20239


#### Finally, let's visualize the resulting clusters

In [51]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vskp_merged['Latitude'], vskp_merged['Longitude'], vskp_merged['Neighborhood'], vskp_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [52]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

#### 8. Examine Clusters

In [53]:
vskp_merged.groupby('Cluster Labels')['Neighborhood'].count()

Cluster Labels
0    55
1    54
2     2
Name: Neighborhood, dtype: int64

In [54]:
vskp_merged.loc[vskp_merged['Cluster Labels'] == 0]
    

Unnamed: 0,Neighborhood,Hotel,Resort,Cluster Labels,Latitude,Longitude
0,Abidnagar,0.064103,0.012821,0,17.73786,83.29888
77,Railway New Colony,0.073529,0.014706,0,17.72862,83.29206
76,Prakashraopeta,0.086957,0.014493,0,17.71718,83.30575
73,Poorna Market,0.087719,0.017544,0,17.70682,83.29815
72,Pithapuram Colony,0.072464,0.014493,0,17.73563,83.32231
70,Pedagantyada,0.166667,0.0,0,17.6668,83.2104
69,Peda Waltair,0.069444,0.013889,0,17.73333,83.33333
68,"Pandurangapuram, Visakhapatnam",0.069444,0.013889,0,17.71793,83.32849
66,One Town (Visakhapatnam),0.129032,0.0,0,17.71984,83.26278
64,Nathayyapalem,0.111111,0.0,0,17.71099,83.20239


In [55]:
vskp_merged.loc[vskp_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Hotel,Resort,Cluster Labels,Latitude,Longitude
80,Ravindra Nagar,0.0,0.0,1,24.54131,81.3104
83,Rushikonda,0.047619,0.0,1,17.79325,83.38739
19,Chinna Gadhili,0.0,0.0,1,31.09586,74.89851
84,Sabbavaram,0.0,0.0,1,17.79343,83.11948
85,Sagar Nagar,0.04,0.0,1,21.20316,79.07717
86,Salipeta,0.0,0.0,1,18.31227,83.2562
14,"Beach Road, Visakhapatnam",0.0,0.0,1,17.812528,83.407895
39,Jagadamba Centre,0.0,0.0,1,20.55427,78.84522
94,Sontyam,0.0,0.0,1,17.86646,83.28725
12,Atchutapuram,0.0,0.0,1,17.5655,82.98174


In [56]:
vskp_merged.loc[vskp_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Hotel,Resort,Cluster Labels,Latitude,Longitude
60,Muralinagar,0.0,0.166667,2,23.29746,77.39692
15,Bheemunipatnam,0.0,0.333333,2,17.88935,83.45037


### Conclusion:
The clusters clearly metioned about the more hotels and less resorts. All the resorts are on the sea shore and they are concentrated at one place around Gangavaram. However, the hotels and resorts are concentrated in the nort and south parts of Visakhapatnam that is cluster 0 and cluster 1. The 2 locations in the cluster 2, Bheemunipatnam and Muralinagar are on the beach road which connects Visakhapatnam and a near by town Bheemunnipatnam and the streach of a beach road about 25 kelometers. So, concentration of Hotel or Resort on this streach is beneficial for both toursists for ambience of beach hotel or beach resort and it is also benficial for the infra developers. Current concentration of hotels and resorts are in clusters 0 and 1 the compition is also very high in these locations. If the infra deverlopers concentrate on the cluster 2 which is more beneficial in terms of business and attracting the toursits/visitors of Visakhapatnam.