*This Jupyter notebook will be used for the completion of project titled "Facility Location Planning in India". The project is done as the final step of completing IBM Data Science Professional Certification via Coursera.*

# Facility Location Planning in India

In [1]:
import pandas as pd
import numpy as np
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim
import requests
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

We extract the information and geographical coordinates of cities across India. The data is scrapped from the website: "https://simplemaps.com/data/in-cities" .

In [2]:
"""
!pip install beautifulsoup4
!pip install lxml
from bs4 import BeautifulSoup
from urllib.request import urlopen

url = "https://simplemaps.com/data/in-cities"
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

table = soup.find_all('table')[0] 
df_pc = pd.read_html(str(table))[0]
"""
# For easy implementation, we download the data read using pandas package
df_in = pd.read_csv("in.csv")

print("The dimesnions of the dataframe are: ", df_in.shape)
df_in.head()

The dimesnions of the dataframe are:  (212, 9)


Unnamed: 0,city,lat,lng,country,iso2,admin,capital,population,population_proper
0,Mumbai,18.987807,72.836447,India,IN,Mahārāshtra,admin,18978000.0,12691836.0
1,Delhi,28.651952,77.231495,India,IN,Delhi,admin,15926000.0,7633213.0
2,Kolkata,22.562627,88.363044,India,IN,West Bengal,admin,14787000.0,4631392.0
3,Chennai,13.084622,80.248357,India,IN,Tamil Nādu,admin,7163000.0,4328063.0
4,Bengalūru,12.977063,77.587106,India,IN,Karnātaka,admin,6787000.0,5104047.0


### Priliminary Data Cleaning

In [3]:
#Data cleaning
df_in.drop(['country', 'iso2', 'population_proper'], axis=1, inplace=True)
df_in.rename(columns={'city':'Location', 'lat':'Latitude', 'lng':'Longitude', 'admin':'State', 'population':'Population'}, inplace=True)
df_in.isna().sum()

Location        0
Latitude        0
Longitude       0
State           0
capital       174
Population      5
dtype: int64

In [7]:
#df_in.drop(['capital'], axis=1, inplace=True)
##df_in.drop_duplicates(subset=['Location'], keep='first', inplace=True)
df_in.dropna(axis=0, inplace=True)
df_in.reset_index(inplace=True)
print(df_in.shape)
df_in.head(33)

(33, 6)


Unnamed: 0,index,Location,Latitude,Longitude,State,Population
0,0,Mumbai,18.987807,72.836447,Mahārāshtra,18978000.0
1,1,Delhi,28.651952,77.231495,Delhi,15926000.0
2,2,Kolkata,22.562627,88.363044,West Bengal,14787000.0
3,3,Chennai,13.084622,80.248357,Tamil Nādu,7163000.0
4,4,Bengalūru,12.977063,77.587106,Karnātaka,6787000.0
5,5,Hyderabad,17.384052,78.456355,Andhra Pradesh,6376000.0
6,6,Ahmadābād,23.025793,72.587265,Gujarāt,5375000.0
7,12,Lucknow,26.839281,80.923133,Uttar Pradesh,2695000.0
8,14,Patna,25.615379,85.101027,Bihār,2158000.0
9,17,Bhopal,23.254688,77.402892,Madhya Pradesh,1727000.0


Now, our dataframe df_in contains only the list of about 33 cities across different parts of India.

In [8]:
# to get the geocordinates of available postal codes

'''
!pip install pgeocode

import pgeocode

nomi = pgeocode.Nominatim('in')

lat_list = []
long_list = []

for postal_code in df_pc['Pincode']:
    lat_list.append(nomi.query_postal_code(postal_code).latitude)
    long_list.append(nomi.query_postal_code(postal_code).longitude)
'''
print(" ")

 


Visualizing the map of India with all the locations:

In [9]:
address = 'India'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of India are {}, {}.'.format(latitude, longitude))

import folium

map_in = folium.Map(location=[latitude, longitude], zoom_start=4)

# add markers to map
for lat, lng, borough, pincode in zip(df_in['Latitude'], df_in['Longitude'], df_in['Location'],df_in['State']):
    label = '{}, {}'.format(borough, pincode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_in)  
    
map_in

  


The geograpical coordinate of India are 22.3511148, 78.6677428.


In [10]:
# We will use forsquare api to obtain top 20 venues (if available) from each of the given locations.

CLIENT_ID = '2N1C3NUYX0ND5HSOIR5VEYMPZUMCDR0ZZID3NGFHAZJU2NM0' # your Foursquare ID
CLIENT_SECRET = '1BJRIJDLOFKVTUCBLNMIDF2MASBGJ0NMGTDJMZ3EB0IQNWCF' # your Foursquare Secret
VERSION = '20180605'

def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    LIMIT=20
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location', 
                  'Location Latitude', 
                  'Location Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

The population column is left behind from analysis due to variation in population density and limitation in foursquare search radius.

In [11]:
in_venues = getNearbyVenues(names=df_in['Location'],
                                   latitudes=df_in['Latitude'],
                                   longitudes=df_in['Longitude']
                                  )

In [12]:
in_venues.head()

Unnamed: 0,Location,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mumbai,18.987807,72.836447,The St. Regis Mumbai,18.993652,72.82522,Hotel
1,Mumbai,18.987807,72.836447,Bhau Daji Lad Museum,18.97914,72.834449,History Museum
2,Mumbai,18.987807,72.836447,High Street Phoenix,18.994967,72.825032,Shopping Mall
3,Mumbai,18.987807,72.836447,Smoke House Deli,18.994478,72.8244,Restaurant
4,Mumbai,18.987807,72.836447,Jai Hind Lunch Home,19.002183,72.829512,Seafood Restaurant


In [19]:
in_venues.groupby('Location').count()

Unnamed: 0_level_0,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agartala,11,11,11,11,11,11
Ahmadābād,20,20,20,20,20,20
Aizawl,5,5,5,5,5,5
Bengalūru,20,20,20,20,20,20
Bhopal,20,20,20,20,20,20
Bhubaneshwar,20,20,20,20,20,20
Chandigarh,20,20,20,20,20,20
Chennai,20,20,20,20,20,20
Daman,13,13,13,13,13,13
Dehra Dūn,20,20,20,20,20,20


*Please note the following:*

*1. The Foursquare data of venues for the Indian state is very less.*

*2. Covid-19 lockdown instructions by the Indian government has restricted the functioning of public places and transport.*

In [20]:
# We will drop the locations with less than 6 venue counts, to avoid any machine learning errors.
in_venue = in_venues[in_venues.groupby('Location').Location.transform('count')>5]
in_venue.groupby('Location').count()

Unnamed: 0_level_0,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agartala,11,11,11,11,11,11
Ahmadābād,20,20,20,20,20,20
Bengalūru,20,20,20,20,20,20
Bhopal,20,20,20,20,20,20
Bhubaneshwar,20,20,20,20,20,20
Chandigarh,20,20,20,20,20,20
Chennai,20,20,20,20,20,20
Daman,13,13,13,13,13,13
Dehra Dūn,20,20,20,20,20,20
Delhi,20,20,20,20,20,20


In [21]:
# We have 28 cities available now for analysis

# one hot encoding
in_onehot = pd.get_dummies(in_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
in_onehot['Location'] = in_venues['Location'] 

# move neighborhood column to the first column
fixed_columns = [in_onehot.columns[-1]] + list(in_onehot.columns[:-1])
in_onehot = in_onehot[fixed_columns]

in_onehot.head()

Unnamed: 0,Location,ATM,Afghan Restaurant,Airport,Airport Terminal,American Restaurant,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Tea Room,Temple,Thai Restaurant,Theater,Tibetan Restaurant,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Video Store,Women's Store
0,Mumbai,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Mumbai,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Mumbai,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Mumbai,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Mumbai,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
in_onehot.shape

(518, 132)

In [23]:
in_grouped = in_onehot.groupby('Location').mean().reset_index()
in_grouped.head()

Unnamed: 0,Location,ATM,Afghan Restaurant,Airport,Airport Terminal,American Restaurant,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Tea Room,Temple,Thai Restaurant,Theater,Tibetan Restaurant,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Video Store,Women's Store
0,Agartala,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ahmadābād,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aizawl,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bengalūru,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,...,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bhopal,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,...,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Clustering and Visualization

In [24]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [39]:
num_top_venues = 9

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Location']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Location'] = in_grouped['Location']

for ind in np.arange(in_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(in_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
0,Agartala,ATM,Salad Place,Hotel,Men's Store,Optical Shop,Platform,Coffee Shop,Park,Indian Restaurant
1,Ahmadābād,Indian Restaurant,Hotel,Fast Food Restaurant,Snack Place,Cricket Ground,Historic Site,Farmers Market,Café,Bookstore
2,Aizawl,Asian Restaurant,Shopping Mall,Café,Park,Hyderabadi Restaurant,French Restaurant,Department Store,Dessert Shop,Diner
3,Bengalūru,Lounge,Hotel,Japanese Restaurant,Deli / Bodega,Cricket Ground,Park,French Restaurant,Racetrack,Burger Joint
4,Bhopal,Indian Restaurant,Hotel,Bakery,Market,Fast Food Restaurant,Coffee Shop,Clothing Store,Pub,Food Court


In [40]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

in_grouped_clustering = in_grouped.drop('Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(in_grouped_clustering)

In [41]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

in_merged = df_in

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
in_merged = in_merged.join(neighborhoods_venues_sorted.set_index('Location'), on='Location')

##tvm_merged.dropna(axis=0, inplace=True)
in_merged.reset_index(drop=True, inplace=True)
in_merged['Cluster Labels'].astype(int)
in_merged.head()

Unnamed: 0,index,Location,Latitude,Longitude,State,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
0,0,Mumbai,18.987807,72.836447,Mahārāshtra,18978000.0,3,Hotel,Indian Restaurant,Shopping Mall,Lounge,Cupcake Shop,Cosmetics Shop,Comedy Club,Pizza Place,Pub
1,1,Delhi,28.651952,77.231495,Delhi,15926000.0,3,Indian Restaurant,Hotel,Lounge,Ice Cream Shop,Plaza,Monument / Landmark,Mosque,Snack Place,South Indian Restaurant
2,2,Kolkata,22.562627,88.363044,West Bengal,14787000.0,3,Chinese Restaurant,Indian Restaurant,Nightclub,Hotel,Mughlai Restaurant,Bakery,Bookstore,Pub,Lounge
3,3,Chennai,13.084622,80.248357,Tamil Nādu,7163000.0,3,Indian Restaurant,Italian Restaurant,Multiplex,Ice Cream Shop,Multicuisine Indian Restaurant,Men's Store,Chinese Restaurant,Chocolate Shop,Movie Theater
4,4,Bengalūru,12.977063,77.587106,Karnātaka,6787000.0,3,Lounge,Hotel,Japanese Restaurant,Deli / Bodega,Cricket Ground,Park,French Restaurant,Racetrack,Burger Joint


In [42]:
in_merged.drop(['index'], axis=1, inplace=True)
import math
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(in_merged['Latitude'], in_merged['Longitude'], in_merged['Location'], in_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[math.floor(cluster)-1],
        fill=True,
        fill_color=rainbow[math.floor(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Analysis of Clusters

In [43]:
#Cluster 0
in_merged.loc[in_merged['Cluster Labels'] == 0, in_merged.columns[[0] + list(range(5, in_merged.shape[1]))]]

Unnamed: 0,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
5,Hyderabad,0,Indian Restaurant,Bakery,Ice Cream Shop,South Indian Restaurant,Juice Bar,Food Truck,Shopping Mall,Multiplex,Lounge
7,Lucknow,0,Indian Restaurant,Café,Fast Food Restaurant,Bakery,Shopping Mall,History Museum,Flea Market,Tea Room,Neighborhood
8,Patna,0,Café,Park,Fast Food Restaurant,Pizza Place,Indian Restaurant,American Restaurant,Juice Bar,Multiplex,Chinese Restaurant
10,Srīnagar,0,Café,Lake,Shopping Mall,Indian Restaurant,Bakery,Chinese Restaurant,Fried Chicken Joint,Flea Market,Park
11,Ranchi,0,Indian Restaurant,Pizza Place,Shopping Mall,Train Station,Hotel Bar,Hotel,Café,Multiplex,Fast Food Restaurant
12,Chandigarh,0,Bakery,Ice Cream Shop,BBQ Joint,Chinese Restaurant,Salon / Barbershop,Sandwich Place,Sculpture Garden,Garden,Coffee Shop
13,Thiruvananthapuram,0,Indian Restaurant,Fast Food Restaurant,Movie Theater,Multiplex,Ice Cream Shop,Jewelry Store,Food & Drink Shop,Planetarium,Food Truck
14,Raipur,0,Café,Shopping Mall,Fast Food Restaurant,Multiplex,Coffee Shop,Vegetarian / Vegan Restaurant,Italian Restaurant,Pizza Place,Flea Market
15,Bhubaneshwar,0,Hotel,Fast Food Restaurant,Coffee Shop,Multiplex,Park,Sandwich Place,Resort,Chinese Restaurant,Shopping Mall
16,Dehra Dūn,0,Fast Food Restaurant,Café,Indian Restaurant,Flea Market,Pizza Place,Ice Cream Shop,Multiplex,Food Court,Bakery


In [44]:
in_merged.loc[in_merged['Cluster Labels'] == 1, in_merged.columns[[0] + list(range(5, in_merged.shape[1]))]]

Unnamed: 0,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
28,Itanagar,1,Women's Store,Train Station,Flea Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food


In [45]:
in_merged.loc[in_merged['Cluster Labels'] == 2, in_merged.columns[[0] + list(range(5, in_merged.shape[1]))]]

Unnamed: 0,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
32,Kavaratti,2,Boat or Ferry,Bank,Food & Drink Shop,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food


In [46]:
in_merged.loc[in_merged['Cluster Labels'] == 3, in_merged.columns[[0] + list(range(5, in_merged.shape[1]))]]

Unnamed: 0,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
0,Mumbai,3,Hotel,Indian Restaurant,Shopping Mall,Lounge,Cupcake Shop,Cosmetics Shop,Comedy Club,Pizza Place,Pub
1,Delhi,3,Indian Restaurant,Hotel,Lounge,Ice Cream Shop,Plaza,Monument / Landmark,Mosque,Snack Place,South Indian Restaurant
2,Kolkata,3,Chinese Restaurant,Indian Restaurant,Nightclub,Hotel,Mughlai Restaurant,Bakery,Bookstore,Pub,Lounge
3,Chennai,3,Indian Restaurant,Italian Restaurant,Multiplex,Ice Cream Shop,Multicuisine Indian Restaurant,Men's Store,Chinese Restaurant,Chocolate Shop,Movie Theater
4,Bengalūru,3,Lounge,Hotel,Japanese Restaurant,Deli / Bodega,Cricket Ground,Park,French Restaurant,Racetrack,Burger Joint
6,Ahmadābād,3,Indian Restaurant,Hotel,Fast Food Restaurant,Snack Place,Cricket Ground,Historic Site,Farmers Market,Café,Bookstore
9,Bhopal,3,Indian Restaurant,Hotel,Bakery,Market,Fast Food Restaurant,Coffee Shop,Clothing Store,Pub,Food Court
18,New Delhi,3,Hotel,Indian Restaurant,Park,History Museum,Café,Karnataka Restaurant,Restaurant,Northeast Indian Restaurant,Market
21,Puducherry,3,Hotel,Pizza Place,Beach,Park,Bakery,Lounge,Spa,Coffee Shop,BBQ Joint
22,Agartala,3,ATM,Salad Place,Hotel,Men's Store,Optical Shop,Platform,Coffee Shop,Park,Indian Restaurant


In [47]:
in_merged.loc[in_merged['Cluster Labels'] == 4, in_merged.columns[[0] + list(range(5, in_merged.shape[1]))]]

Unnamed: 0,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
19,Aizawl,4,Asian Restaurant,Shopping Mall,Café,Park,Hyderabadi Restaurant,French Restaurant,Department Store,Dessert Shop,Diner


From the clustering of Indian cities, we obtain the following conclusions:

1. Cluster-0 is the group of Tier-1 cities across the country. If a person prefers a steady job (government or private) and hassle free living environment, these cities should a good choice to settle.

2. Cluster-1 contains only the city Itanagar which is in the eastern end of India. There is less structural development and we can easily notice this from the top visited places across the city. It is also important to note that this state, Arunachal Pradesh, has the lowest population density across the India.

3. Cluster-2 contains one city named Karavatti. Observing the most visited places, it can be inferred that this is a tourist destination of natural beauty. Karavatti is actually the capital of Lakshadweep island. For tourism, especially for international visitors, business planning in the city can be beneficial as a business.

4. Cluster-3 grouped together most of the metro/top-developed cities in the country. Multi-national companies and other Indian start-ups should prefer setting up their main offices in one of these cities for the purpose of networking and business development.

5. Cluster-4 contains Aizawl which is also a city in less developed area. However, compared to Cluster-1 this city is more developed and more facilities are available.

### Disclaimer:

*The given analysis is the part of user's learning exercise and should be verified before its usage. The outcomes are heavily relies on the input data and thus any discrepancy in the data can result in huge variations in the prediction.*

Thank you for taking your time to read my work!

**Abhay Sobhanan**