# HI THERE
 This notebook will consist of the following parts:<br>
 
 1. Data collection<br>
 2. Data manipulation<br>
 3. Modeling with K-means<br>
 4. Visualization through Folium

### DATA COLLECTION

The primary data sources are:<br>
1. Tripadivsor website<br>
2. Geocoder using geopy<br>
3. Foursquare API<br>

In [46]:
#importing essential libraries
import numpy as np
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

#import beautiful soup
from urllib.request import urlopen
from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


Using the **beautifulsoup** library we extract the top 10 destinations that one must visit in Kolkata.
Here, you can take a look yourself<br>
__[TripAdvisor]('https://www.tripadvisor.in/Attractions-g304558-Activities-Kolkata_Calcutta_Kolkata_District_West_Bengal.html')__

In [2]:
#creating url and beautifulsoup objects for web scraping
url='https://www.tripadvisor.in/Attractions-g304558-Activities-Kolkata_Calcutta_Kolkata_District_West_Bengal.html'
result=requests.get(url)
print(result.status_code)#prints 200 if the url is accesible
soup=BeautifulSoup(result.content,'html.parser')

200


When i was accessing the geocoder, i found that some of the locations didnot return a latitude_longitude.I thought there might be some spelling mistakes in the names but found that that's not the case.So, i removed those locations using a list **geocoder_not_working**

In [3]:
#extraxcting the locations and converting it to a data frame
links=soup.find_all('a',class_='attractions-attraction-overview-pois-PoiInfo__name--SJ0a4')
text_list=[link.get_text() for link in links]
text_list=text_list[:13]
Top_Locations=pd.DataFrame(text_list)
Top_Locations.columns=['name']
geocoder_not_working=['Dakshineswar Kali Temple','Eco Tourism Park','College Street (Boi Para)']
Top_Locations=Top_Locations[~Top_Locations['name'].isin(geocoder_not_working)]
locations_list=Top_Locations['name'].tolist()

The locations with their co-ordinates are stored in a pandas DataFrame (Top_Locations) for later marking in the folium map.

In [4]:
#loop through each location to find their co-ordinates
lat=[]
long=[]
for ele in locations_list:
    geolocator = Nominatim(user_agent="my-application")
    address = '{}, Kolkata'.format(ele)
    geolocator = Nominatim(user_agent="my-application")
    location = geolocator.geocode(address)
    lat.append(location.latitude)
    long.append(location.longitude)

In [5]:
#creating a comprehensive data frame
Top_Locations['Lat']=lat
Top_Locations['Long']=long
Top_Locations

Unnamed: 0,name,Lat,Long
0,Victoria Memorial Hall,22.54508,88.342643
2,Mother House,22.553101,88.363662
3,Park Street,22.555159,88.350117
5,Howrah Bridge,22.585091,88.346825
6,Eden Gardens,22.564588,88.34229
7,Science City,22.539925,88.39581
8,Quest Mall,22.539027,88.365656
10,Prinsep Ghat,22.556573,88.331418
11,Birla Planetarium,22.545507,88.347318
12,New Market,22.560119,88.356735


Inorder to use the geocoder from geopy library,we need the location co-ordinates of the city of Kolkata

In [6]:
#geocoding location in co-ordinates
address = 'Kolkata, West Bengal'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Kolkata are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Kolkata are 22.5414392, 88.3517606171429.


Setting up FourSquare credentials:

In [7]:
CLIENT_ID = 'LOGKHYI2TF0NAO3SONDBWJWAC5J3MIHLYVMFRUX4DOFXDBY0' # your Foursquare ID
CLIENT_SECRET = 'VHXL2Q0JRYHHHCZ0KEXJRIAJO5MJIPWCFCGDYB0GVECXUV5V' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LOGKHYI2TF0NAO3SONDBWJWAC5J3MIHLYVMFRUX4DOFXDBY0
CLIENT_SECRET:VHXL2Q0JRYHHHCZ0KEXJRIAJO5MJIPWCFCGDYB0GVECXUV5V


In [8]:
LIMIT = 1000 # limit of number of venues returned by Foursquare API
radius = 10000 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=LOGKHYI2TF0NAO3SONDBWJWAC5J3MIHLYVMFRUX4DOFXDBY0&client_secret=VHXL2Q0JRYHHHCZ0KEXJRIAJO5MJIPWCFCGDYB0GVECXUV5V&v=20180605&ll=22.5414392,88.3517606171429&radius=10000&limit=1000'

In [9]:
#getting the results for the API Explore query
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5df09664b4b684001bb5885c'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Kolkata',
  'headerFullLocation': 'Kolkata',
  'headerLocationGranularity': 'city',
  'totalResults': 165,
  'suggestedBounds': {'ne': {'lat': 22.63143929000009,
    'lng': 88.44902329029748},
   'sw': {'lat': 22.45143910999991, 'lng': 88.25449794398833}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d43fd7b4e5d370413f4e593',
       'name': 'The Blue Poppy',
       'location': {'address': '4/1, Middleton St',
        'lat': 22.548543210513543,
        'lng': 88.35135293517806,
        'labeledLatLngs': [{'label': 'display',
          'lat': 22.548543210513543,
      

In [10]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [11]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
filtered_columns = ['venue.name', 'venue.id', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues

Unnamed: 0,venue.name,venue.id,venue.categories,venue.location.lat,venue.location.lng
0,The Blue Poppy,4d43fd7b4e5d370413f4e593,Asian Restaurant,22.548543,88.351353
1,Jai Hind Dhaba,4d050f03347da1cd0d012f8f,Dhaba,22.533109,88.353268
2,Victoria Memorial,4c0218258ef2c9b66d9c16fc,History Museum,22.545844,88.34289
3,Balwant Singh's Eating House,4d1c6d5a7e10a35dcb13ff82,Dhaba,22.537714,88.34422
4,Peter Cat,4c284edefe6e2d7f5417533c,Indian Restaurant,22.552365,88.352544
5,Oh! Calcutta,4c0a0ee8bbc676b0b24449d5,Bengali Restaurant,22.538357,88.351406
6,Yauatcha,534fda5e498e8696570687b8,Chinese Restaurant,22.539091,88.365573
7,Maidan,50ee506ee4b0f01d13d51008,Field,22.549906,88.344219
8,The Oberoi Grand,4d4c107ee1ec6dcbad95d475,Hotel,22.561749,88.351594
9,Nocturne,4d70225a9aac224bd12337ed,Nightclub,22.545078,88.357384


In [12]:

#fix the column names so they look relatively normal

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,id,categories,lat,lng
0,The Blue Poppy,4d43fd7b4e5d370413f4e593,Asian Restaurant,22.548543,88.351353
1,Jai Hind Dhaba,4d050f03347da1cd0d012f8f,Dhaba,22.533109,88.353268
2,Victoria Memorial,4c0218258ef2c9b66d9c16fc,History Museum,22.545844,88.34289
3,Balwant Singh's Eating House,4d1c6d5a7e10a35dcb13ff82,Dhaba,22.537714,88.34422
4,Peter Cat,4c284edefe6e2d7f5417533c,Indian Restaurant,22.552365,88.352544


In [13]:
nearby_venues.shape

(100, 5)

### DATA MANIPULATION

Some of the venues contained business that one would not go to, if on a tight schedule. So i filtered out those venues using list **not_needed**.<br> The rest will be split into food venues and other venues. The food venues will be clustered using k means while others will be superimposed on the map. 

In [14]:
not_needed=['Field','Cricket Ground','Arts & Crafts Store','Bookstore','Boutique','Sports Club','Golf Course','Department Store','Hotel']
filtered_venues=nearby_venues[~nearby_venues['categories'].isin(not_needed)]
others=['History Museum','Shopping Mall','Park','Harbor / Marina','Multiplex','Theme Park','Clothing Store','Botanical Garden']
food_venues=filtered_venues[~nearby_venues['categories'].isin(others)]
print('food_venues='+str(food_venues.shape))
other_venues=filtered_venues[nearby_venues['categories'].isin(others)]
print('other_venues='+str(other_venues.shape))

food_venues=(66, 5)
other_venues=(15, 5)




The Top_Locations dataframe and other_venues had 2 locations in common -- Victoria Memorial Hall and Science City.<br>
Inorder to avoid duplicity, these locations will  be removed.

In [15]:
other_venues.reset_index(inplace=True)
other_venues.drop([0 ,9],axis=0,inplace=True)
other_venues.drop("index",axis=1,inplace=True)
other_venues

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,name,id,categories,lat,lng
1,Quest Mall,5274b7f111d2b513631071a5,Shopping Mall,22.539068,88.365525
2,Princep Ghat,4f179ce1e4b035bfa73f9f27,Harbor / Marina,22.559267,88.333057
3,Inox,528c3b2611d2e42e49b72774,Multiplex,22.539139,88.365559
4,Acropolis Mall,56042ace498efedb770e3dcc,Shopping Mall,22.514823,88.393235
5,Triangular Park,4fc0f8ade4b0d51624e249a9,Park,22.517924,88.358289
6,Deshapriya Park,4ce40d96dc85a1432f6b49d2,Park,22.518395,88.353177
7,South City Mall,4bb3427e715eef3bdd1286bb,Shopping Mall,22.501758,88.361726
8,INOX,4d4404083616b60c94b5dfc2,Multiplex,22.570893,88.401026
10,Shoppers Stop,4cd291d190f23704107e8be8,Clothing Store,22.501683,88.361759
11,Cinépolis,56042b21498e21a5e19ae496,Multiplex,22.514824,88.393236


In [16]:
venue_id_list = food_venues['id'].tolist()
venue_id_list


['4d43fd7b4e5d370413f4e593',
 '4d050f03347da1cd0d012f8f',
 '4d1c6d5a7e10a35dcb13ff82',
 '4c284edefe6e2d7f5417533c',
 '4c0a0ee8bbc676b0b24449d5',
 '534fda5e498e8696570687b8',
 '4d70225a9aac224bd12337ed',
 '5001966ae4b0444ec20e4135',
 '55e44fc0498e6b62fb94d77b',
 '4e2c0d7a18a80bb0585fae4d',
 '4beffa4ca09076b01ea229d4',
 '511b63ace4b00262d57ed073',
 '4befc8c03a002d7f386285a4',
 '57c9d100498e12bbb73fa6a4',
 '4d54ed50c6edf04d1af5bea2',
 '4cf3afe36c29236adc2472a2',
 '4e0b1e8422713e13018ee0a1',
 '534a4e4f498e504051fbb385',
 '4beebe513686c9b69bde246e',
 '4da1df087aee548112fec9fe',
 '4bbf785db083a593d77fa3e9',
 '5239ee910493e46a092f23c7',
 '4fd7482a7716469f62fb4560',
 '4c110a3a81e976b0291410eb',
 '5548d780498e5cef9ae2dcae',
 '4cf3b6667e93f04dd8a15969',
 '534fda8b498e30c1f4f3ccff',
 '4dfa2b8aae60f95f8225b9c9',
 '4d19f40dbb488cfa6628c1d4',
 '4cdea62ef8cdb1f7a35e8c12',
 '4c59af3e04f9be9a23ecef60',
 '534fd97c498effb74edc096b',
 '5234b18e04932a88f1ff430d',
 '514d613de4b0ab03fe0601fb',
 '4d145ea085fc

The next step is to get the amount of likes each venue got. I used Foursquare API to get those data. 

In [17]:
#set up to pull the likes from the API based on venue ID
url_list = []
like_list = []
json_list = []

for i in venue_id_list:
    venue_url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(i, CLIENT_ID, CLIENT_SECRET, VERSION)
    url_list.append(venue_url)
for link in url_list:
    result = requests.get(link).json()
    likes = result['response']['likes']['count']
    like_list.append(likes)
print(like_list)

[23, 48, 74, 175, 20, 14, 36, 7, 10, 12, 21, 7, 151, 9, 39, 8, 100, 7, 80, 8, 24, 13, 7, 21, 10, 12, 6, 22, 20, 20, 8, 22, 10, 9, 14, 15, 40, 35, 54, 6, 20, 8, 13, 8, 13, 14, 41, 10, 11, 11, 5, 9, 1, 5, 5, 21, 5, 5, 16, 7, 5, 40, 16, 8, 14, 11]


In [18]:
print(len(like_list))
print(len(venue_id_list))

66
66


In [19]:
#the list was added to the dataframe
food_venues['Likes']=like_list
food_venues.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0,name,id,categories,lat,lng,Likes
0,The Blue Poppy,4d43fd7b4e5d370413f4e593,Asian Restaurant,22.548543,88.351353,23
1,Jai Hind Dhaba,4d050f03347da1cd0d012f8f,Dhaba,22.533109,88.353268,48
3,Balwant Singh's Eating House,4d1c6d5a7e10a35dcb13ff82,Dhaba,22.537714,88.34422,74
4,Peter Cat,4c284edefe6e2d7f5417533c,Indian Restaurant,22.552365,88.352544,175
5,Oh! Calcutta,4c0a0ee8bbc676b0b24449d5,Bengali Restaurant,22.538357,88.351406,20


In [20]:
#statistical properties of the likes column was found out to fix the quality boundaries
food_venues['Likes'].describe()

count     66.000000
mean      23.318182
std       31.065271
min        1.000000
25%        8.000000
50%       13.000000
75%       21.750000
max      175.000000
Name: Likes, dtype: float64

From the output above, we can see obtain the 1st, 2nd and 3rd quartiles of the data. These will be defined as our boundaries.
A custom method will be used to apply our defined quality categories such as poor, below average, above average and excellent

In [21]:
# let's set up a function that will re-categorize our restaurants based on likes

def conditions(s):
    if s['Likes']<=8:
        return 'poor'
    if s['Likes']<=13:
        return 'below avg'
    if s['Likes']<=21:
        return 'above avg'
    if s['Likes']>21:
        return 'excellent'

food_venues['Quality']=food_venues.apply(conditions, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [22]:
food_venues

Unnamed: 0,name,id,categories,lat,lng,Likes,Quality
0,The Blue Poppy,4d43fd7b4e5d370413f4e593,Asian Restaurant,22.548543,88.351353,23,excellent
1,Jai Hind Dhaba,4d050f03347da1cd0d012f8f,Dhaba,22.533109,88.353268,48,excellent
3,Balwant Singh's Eating House,4d1c6d5a7e10a35dcb13ff82,Dhaba,22.537714,88.34422,74,excellent
4,Peter Cat,4c284edefe6e2d7f5417533c,Indian Restaurant,22.552365,88.352544,175,excellent
5,Oh! Calcutta,4c0a0ee8bbc676b0b24449d5,Bengali Restaurant,22.538357,88.351406,20,above avg
6,Yauatcha,534fda5e498e8696570687b8,Chinese Restaurant,22.539091,88.365573,14,above avg
9,Nocturne,4d70225a9aac224bd12337ed,Nightclub,22.545078,88.357384,36,excellent
11,The Rouge,5001966ae4b0444ec20e4135,Café,22.541217,88.359267,7,poor
12,TGI Fridays,55e44fc0498e6b62fb94d77b,American Restaurant,22.538592,88.351398,10,below avg
13,Blue & Beyond,4e2c0d7a18a80bb0585fae4d,Pub,22.559131,88.35328,12,below avg


In [23]:
categories=food_venues.categories.unique().tolist()

The food venies will again be categorised into different cuisines for easy clustering.<br>
The 4 categories include:
1. Indian food
2. Other Asian food
3. Western food
4. Beverages

In [24]:
# let's create our new categories and create a function to apply those to our existing data

indian_food = ['Dhaba','Diner', 'Indian Restaurant','Coffee Shop','Bengali Restaurant','Multicuisine Indian Restaurant','Indian Sweet Shop', 'Bakery','Restaurant','Plaza', 'Lounge', 'Awadhi Restaurant','Mughlai Restaurant']
asian_other_than_indian_food = ['Asian Restaurant','Chinese Restaurant','Tibetan Restaurant','Dumpling Restaurant']
western_food = ['Café', 'American Restaurant','BBQ Joint','Sandwich Place','Italian Restaurant','Falafel Restaurant', 'Tex-Mex Restaurant', 'Irish Pub']
bars = ['Nightclub','Bar','Pub','Gastropub','Brewery']

def conditions2(s):
    if s['categories'] in indian_food:
        return 'Indian'
    if s['categories'] in asian_other_than_indian_food:
        return 'Other Asian'
    if s['categories'] in western_food:
        return 'Western food'
    if s['categories'] in bars:
        return 'Beverages'
    
food_venues['Food categories']=food_venues.apply(conditions2, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [25]:
food_venues.head()

Unnamed: 0,name,id,categories,lat,lng,Likes,Quality,Food categories
0,The Blue Poppy,4d43fd7b4e5d370413f4e593,Asian Restaurant,22.548543,88.351353,23,excellent,Other Asian
1,Jai Hind Dhaba,4d050f03347da1cd0d012f8f,Dhaba,22.533109,88.353268,48,excellent,Indian
3,Balwant Singh's Eating House,4d1c6d5a7e10a35dcb13ff82,Dhaba,22.537714,88.34422,74,excellent,Indian
4,Peter Cat,4c284edefe6e2d7f5417533c,Indian Restaurant,22.552365,88.352544,175,excellent,Indian
5,Oh! Calcutta,4c0a0ee8bbc676b0b24449d5,Bengali Restaurant,22.538357,88.351406,20,above avg,Indian


One hot encoding will be done to convert the categorical values to numerical values.<br>

In [26]:
# one hot encoding
food_venues_onehot = pd.get_dummies(food_venues[['Food categories', 'Quality']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
food_venues_onehot['Name'] = food_venues['name'] 

fixed_columns = [food_venues_onehot.columns[-1]] + list(food_venues_onehot.columns[:-1])
food_venues_onehot = food_venues_onehot[fixed_columns]

food_venues_onehot.head()

Unnamed: 0,Name,Beverages,Indian,Other Asian,Western food,above avg,below avg,excellent,poor
0,The Blue Poppy,0,0,1,0,0,0,1,0
1,Jai Hind Dhaba,0,1,0,0,0,0,1,0
3,Balwant Singh's Eating House,0,1,0,0,0,0,1,0
4,Peter Cat,0,1,0,0,0,0,1,0
5,Oh! Calcutta,0,1,0,0,1,0,0,0


### MODELING WITH K-MEANS

In [27]:
df = food_venues_onehot.drop('Name', axis=1)

k_clusters = 4 # number of clusters was limited to 4 for simplicity 

# run k-means clustering
kmeans = KMeans(n_clusters=k_clusters, random_state=0).fit(df)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([3, 3, 3, 3, 2, 2, 3, 1, 0, 0, 2, 1, 3, 0, 3, 1, 3, 1, 3, 1, 3, 0,
       1, 2, 0, 0, 1, 3, 2, 2, 1, 3, 0, 0, 2, 2, 3, 3, 3, 1, 2, 1, 0, 1,
       0, 2, 3, 0, 0, 0, 1, 0, 1, 1, 1, 2, 1, 1, 2, 1, 1, 3, 2, 1, 2, 0],
      dtype=int32)

In [28]:
#adding the labels to the data frame
food_venues['Label'] = kmeans.labels_
food_venues.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0,name,id,categories,lat,lng,Likes,Quality,Food categories,Label
0,The Blue Poppy,4d43fd7b4e5d370413f4e593,Asian Restaurant,22.548543,88.351353,23,excellent,Other Asian,3
1,Jai Hind Dhaba,4d050f03347da1cd0d012f8f,Dhaba,22.533109,88.353268,48,excellent,Indian,3
3,Balwant Singh's Eating House,4d1c6d5a7e10a35dcb13ff82,Dhaba,22.537714,88.34422,74,excellent,Indian,3
4,Peter Cat,4c284edefe6e2d7f5417533c,Indian Restaurant,22.552365,88.352544,175,excellent,Indian,3
5,Oh! Calcutta,4c0a0ee8bbc676b0b24449d5,Bengali Restaurant,22.538357,88.351406,20,above avg,Indian,2


In [29]:
# checking out the count in each cluster
food_venues.Label.value_counts()

1    20
3    17
0    15
2    14
Name: Label, dtype: int64

**Checking out the properties of each cluster:**

**Cluster 0:**<br>
Venues Categories: Indian and Western foods dominate<br>
Quality:All of them are **below average**

In [30]:
cluster0=food_venues.loc[food_venues['Label']==0]
cluster0['Food categories'].value_counts()

Indian          6
Western food    4
Beverages       3
Other Asian     2
Name: Food categories, dtype: int64

In [31]:
cluster0

Unnamed: 0,name,id,categories,lat,lng,Likes,Quality,Food categories,Label
12,TGI Fridays,55e44fc0498e6b62fb94d77b,American Restaurant,22.538592,88.351398,10,below avg,Western food,0
13,Blue & Beyond,4e2c0d7a18a80bb0585fae4d,Pub,22.559131,88.35328,12,below avg,Beverages,0
19,Monkey Bar,57c9d100498e12bbb73fa6a4,Gastropub,22.544286,88.35196,9,below avg,Beverages,0
27,Arsalan,5239ee910493e46a092f23c7,Mughlai Restaurant,22.553897,88.354063,13,below avg,Indian,0
32,8th Day Cafe & Bakery,5548d780498e5cef9ae2dcae,Café,22.542407,88.360981,10,below avg,Western food,0
33,Marco Polo,4cf3b6667e93f04dd8a15969,Chinese Restaurant,22.551477,88.353827,12,below avg,Other Asian,0
46,India Restaurant & Caterer,5234b18e04932a88f1ff430d,Awadhi Restaurant,22.538904,88.322278,10,below avg,Indian,0
48,Paris Café,514d613de4b0ab03fe0601fb,Bakery,22.534915,88.364812,9,below avg,Indian,0
66,Girish Chandra Dey & Nakur Chandra Nandy,5085d451e4b0fedcb352298c,Indian Sweet Shop,22.59604,88.367485,13,below avg,Indian,0
68,The GRID,579747adcd1069eeb483cc39,Brewery,22.541975,88.386362,13,below avg,Beverages,0


**Cluster 1:**<br>
Venues Categories: Indian foods dominate<br>
Quality:All of them are **poor**

In [32]:
cluster1=food_venues.loc[food_venues['Label']==1]
cluster1['Food categories'].value_counts()

Indian          12
Other Asian      4
Western food     4
Name: Food categories, dtype: int64

In [33]:
cluster1

Unnamed: 0,name,id,categories,lat,lng,Likes,Quality,Food categories,Label
11,The Rouge,5001966ae4b0444ec20e4135,Café,22.541217,88.359267,7,poor,Western food,1
16,Mithai,511b63ace4b00262d57ed073,Indian Sweet Shop,22.538354,88.36499,7,poor,Indian,1
21,The French Loaf,4cf3afe36c29236adc2472a2,Bakery,22.53924,88.354921,8,poor,Indian,1
23,Bombay Brasserie,534a4e4f498e504051fbb385,Indian Restaurant,22.53877,88.365532,7,poor,Indian,1
25,Sonargaon,4da1df087aee548112fec9fe,Indian Restaurant,22.537545,88.33421,8,poor,Indian,1
28,Balaram Mullick & Radharaman Mullick,4fd7482a7716469f62fb4560,Indian Sweet Shop,22.533097,88.347082,7,poor,Indian,1
35,Serafina,534fda8b498e30c1f4f3ccff,Italian Restaurant,22.539176,88.365451,6,poor,Western food,1
41,The Junction,4c59af3e04f9be9a23ecef60,Lounge,22.537845,88.334475,8,poor,Indian,1
59,Chowman,51bc401a498e587913694c22,Chinese Restaurant,22.526977,88.368509,6,poor,Other Asian,1
65,Amber,4dada0736e81d745d7456d26,Indian Restaurant,22.567544,88.351847,8,poor,Indian,1


**Cluster 2:**<br>
Venues Categories: Indian and Western foods dominate<br>
Quality:All of them are **above average**

In [34]:
cluster2=food_venues.loc[food_venues['Label']==2]
cluster2['Food categories'].value_counts()

Western food    5
Indian          4
Other Asian     3
Beverages       2
Name: Food categories, dtype: int64

In [35]:
cluster2

Unnamed: 0,name,id,categories,lat,lng,Likes,Quality,Food categories,Label
5,Oh! Calcutta,4c0a0ee8bbc676b0b24449d5,Bengali Restaurant,22.538357,88.351406,20,above avg,Indian,2
6,Yauatcha,534fda5e498e8696570687b8,Chinese Restaurant,22.539091,88.365573,14,above avg,Other Asian,2
14,Underground,4beffa4ca09076b01ea229d4,Nightclub,22.54129,88.350629,21,above avg,Beverages,2
30,Shiraz Golden Restaurant,4c110a3a81e976b0291410eb,Mughlai Restaurant,22.546536,88.361654,21,above avg,Indian,2
38,Beijing Restaurant,4d19f40dbb488cfa6628c1d4,Chinese Restaurant,22.546518,88.387213,20,above avg,Other Asian,2
39,Golden Joy,4cdea62ef8cdb1f7a35e8c12,Chinese Restaurant,22.545592,88.387475,20,above avg,Other Asian,2
49,Subway,4d145ea085fc6dcbe22e974e,Sandwich Place,22.501667,88.361787,14,above avg,Western food,2
51,Go Lebanese,51409ae6e4b0c2b51cfd04b0,Falafel Restaurant,22.513831,88.353231,15,above avg,Western food,2
64,Gariahat Junction,4fd607c3121dc5ba008f840a,Plaza,22.519704,88.365422,20,above avg,Indian,2
69,Cafe Coffee Day,533ada3d498e2b453cfdcb32,Café,22.577619,88.390439,14,above avg,Western food,2


**Cluster 3:**<br>
Venues Categories: Indian, Other Asian and Western foods dominate<br>
Quality:All of them are **Excellent**

In [36]:
cluster3=food_venues.loc[food_venues['Label']==3]
cluster3['Food categories'].value_counts()

Indian          9
Other Asian     3
Western food    3
Beverages       2
Name: Food categories, dtype: int64

In [37]:
cluster3

Unnamed: 0,name,id,categories,lat,lng,Likes,Quality,Food categories,Label
0,The Blue Poppy,4d43fd7b4e5d370413f4e593,Asian Restaurant,22.548543,88.351353,23,excellent,Other Asian,3
1,Jai Hind Dhaba,4d050f03347da1cd0d012f8f,Dhaba,22.533109,88.353268,48,excellent,Indian,3
3,Balwant Singh's Eating House,4d1c6d5a7e10a35dcb13ff82,Dhaba,22.537714,88.34422,74,excellent,Indian,3
4,Peter Cat,4c284edefe6e2d7f5417533c,Indian Restaurant,22.552365,88.352544,175,excellent,Indian,3
9,Nocturne,4d70225a9aac224bd12337ed,Nightclub,22.545078,88.357384,36,excellent,Beverages,3
17,Flurys,4befc8c03a002d7f386285a4,Bakery,22.552786,88.352625,151,excellent,Indian,3
20,6 Ballygunge Place,4d54ed50c6edf04d1af5bea2,Bengali Restaurant,22.527712,88.368677,39,excellent,Indian,3
22,Bar-B-Q,4e0b1e8422713e13018ee0a1,BBQ Joint,22.553125,88.352625,100,excellent,Western food,3
24,Mocambo,4beebe513686c9b69bde246e,Restaurant,22.553206,88.353296,80,excellent,Indian,3
26,Aqua,4bbf785db083a593d77fa3e9,Lounge,22.554734,88.35218,24,excellent,Indian,3


### DATA VISUALIZATION

The **food_venues** will be marked on folium map with circle markers:<br>
<br>


|Cluster|Color|Quality|Food Category|
|-------|-----|-------|-------------| 
|0|Red|Below Average|Indian and Western| 
|1|Violet|Poor|Indian| 
|2|Cyan|Above Average|Indian and Western| 
|3|Yellow|Excellent|Indian, Other Asian and Western|


In [38]:
map_Kolkata = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i+x+(i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, long, poi, cluster in zip(food_venues['lat'], food_venues['lng'], food_venues['name'], food_venues['Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=3,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_Kolkata)
       
map_Kolkata

Next, we will mark the **other_venues** inthe folium map<br>
Color: light gray<br>
Marker: Tooltip<br>
icon: Arrow-down

In [39]:
for lat, long, poi in zip(other_venues['lat'], other_venues['lng'], other_venues['name']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.Marker(
        [lat, long],
        popup=label,
        icon=folium.Icon(color='lightgray', icon='arrow-down')).add_to(map_Kolkata)
       
map_Kolkata

Next, we will show the **Top_Locations** dataframe to visualize the must-see places:
Color: blue<br>
Marker: Tooltip<br>
icon: Bookmark

In [42]:
for lat, long, poi in zip(Top_Locations['Lat'], Top_Locations['Long'], Top_Locations['name']):
    label = folium.Popup(str(poi), parse_html=True)
    tooltip='Click me!'
    folium.Marker(
        [lat, long],
        popup=label,
        icon=folium.Icon(color='blue', icon='bookmark')).add_to(map_Kolkata)
       
map_Kolkata

In [None]:
top_food=cluster3['name','categories','Likes','Quality']

                                                  Thanks for reviewing the notebook :)