Step 1: We have imported all the required libraries for our Cluster Analysis.

In [53]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.9.1
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Libraries imported.


_________________________________________________________________________________________________________________________________________________________________
Step 2: We import the latitude and longitude data for all postal codes in India downloaded from the website http://download.geonames.org/export/zip/.

In [54]:
df_IN = pd.read_csv('IN.csv')
df_IN.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,countrycode,postalcode,placename,adminname1,admincode1,adminname2,admincode2,adminname3,admincode3,latitude,longitude,accuracy
0,IN,744301,Sawai,Andaman & Nicobar Islands,1,Nicobar,638,Carnicobar,,7.5166,93.6031,4.0
1,IN,744301,Carnicobar,Andaman & Nicobar Islands,1,Nicobar,638,Carnicobar,,9.1833,92.7667,3.0
2,IN,744301,Mus,Andaman & Nicobar Islands,1,Nicobar,638,Carnicobar,,9.2333,92.7833,4.0
3,IN,744301,Lapathy,Andaman & Nicobar Islands,1,Nicobar,638,Carnicobar,,9.1833,92.7667,3.0
4,IN,744301,Kakana,Andaman & Nicobar Islands,1,Nicobar,638,Carnicobar,,9.1167,92.8,4.0


_________________________________________________________________________________________________________________________________________________________________
Step 3: We remove the duplicates from our dataframe df_IN and we keep the first entry of the postal code. This way we can ensure we do not have any duplicates in our data.

In [55]:
df_IN2 = df_IN.drop_duplicates(subset='postalcode', keep="first")
df_IN2.head()

Unnamed: 0,countrycode,postalcode,placename,adminname1,admincode1,adminname2,admincode2,adminname3,admincode3,latitude,longitude,accuracy
0,IN,744301,Sawai,Andaman & Nicobar Islands,1,Nicobar,638,Carnicobar,,7.5166,93.6031,4.0
5,IN,744302,Shabnamnagar,Andaman & Nicobar Islands,1,Nicobar,638,Nancorie,,9.1833,92.7667,1.0
11,IN,744303,Nancowrie,Andaman & Nicobar Islands,1,Nicobar,638,Nancowrie,,9.1833,92.7667,1.0
16,IN,744304,Kapanga,Andaman & Nicobar Islands,1,Nicobar,638,Nancowrie,,9.1833,92.7667,1.0
19,IN,744201,Betapur,Andaman & Nicobar Islands,1,North And Middle Andaman,639,Rangat,,12.7167,92.9,4.0


_________________________________________________________________________________________________________________________________________________________________
Step 4: We import the all India Postal code library downloaded from the official government website: https://data.gov.in.

In [56]:
df_Pin = pd.read_csv('Pincode_30052019.csv')
df_Pin.head()

Unnamed: 0,CircleName,RegionName,DivisionName,OfficeName,Pincode,OfficeType,Delivery,District,StateName
0,Andhra Pradesh Circle,Kurnool Region,Anantapur Division,A Narayanapuram B.O,515004,BO,Delivery,ANANTHAPUR,Andhra Pradesh
1,Andhra Pradesh Circle,Kurnool Region,Anantapur Division,Akuledu B.O,515731,BO,Delivery,ANANTHAPUR,Andhra Pradesh
2,Andhra Pradesh Circle,Kurnool Region,Anantapur Division,Alamuru B.O,515002,BO,Delivery,ANANTHAPUR,Andhra Pradesh
3,Andhra Pradesh Circle,Kurnool Region,Anantapur Division,Allapuram B.O,515766,BO,Delivery,ANANTHAPUR,Andhra Pradesh
4,Andhra Pradesh Circle,Kurnool Region,Anantapur Division,Aluru B.O,515415,BO,Delivery,ANANTHAPUR,Andhra Pradesh


_________________________________________________________________________________________________________________________________________________________________
Step 5: Let us check the unique entries in the column 'RegionName' to ensure it has the region Navi Mumbai.

In [57]:
df_Pin.RegionName.unique()

array(['Kurnool Region', 'Vijayawada Region', 'Visakhapatnam Region', nan,
       'Dibrugarh Region', 'East Region, Bhagalpur', 'Muzaffarpur Region',
       'Raipur Region', 'Ahmedabad HQ Region', 'Rajkot Region',
       'Vadodara Region', 'Srinagar HQ Region', 'Bengaluru HQ Region',
       'North Karnataka Region', 'South Karnataka Region',
       'Calicut Region', 'Kochi Region', 'Indore Region',
       'Jabalpur Region', 'Aurangabad Region', 'Goa-Panaji Region',
       'Mumbai Region', 'Nagpur Region', 'Navi Mumbai Region',
       'Pune Region', 'North Eastern Region', 'Shillong HQ Region',
       'Berhampur Region', 'Sambalpur Region', 'Punjab West Region',
       'Ajmer Region', 'Jodhpur Region', 'Central Region, Trichirapalli',
       'Chennai City Region', 'Southern Region, Madurai',
       'Western Region, Coimbatore', 'Hyderabad City Region',
       'Hyderabad Region', 'Agra Region', 'Allahabad Region',
       'Bareilly Region', 'Gorakhpur Region', 'Kanpur Region',
       'Luc

_________________________________________________________________________________________________________________________________________________________________
Step 6: Now that we have confirmed Navi Mumbai is a region in the given dataset, we will seperate it from the main dataset.

In [58]:
is_Navi =  ['Navi Mumbai Region']
df_Nmum = df_Pin[df_Pin.RegionName.isin(is_Navi)]
df_Nmum.head()

Unnamed: 0,CircleName,RegionName,DivisionName,OfficeName,Pincode,OfficeType,Delivery,District,StateName
78787,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Abhona S.O,423502,SO,Delivery,Jalgaon,Maharashtra
78788,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Adgaon B.O,423101,BO,Delivery,Jalgaon,Maharashtra
78789,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar BK B.O,423201,BO,Delivery,Malegaon,Maharashtra
78790,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar KH B.O,423208,BO,Delivery,Malegaon,Maharashtra
78791,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Ahergaon B.O,422209,BO,Delivery,Malegaon,Maharashtra


_________________________________________________________________________________________________________________________________________________________________
Step 7: Let us check the shape of our new dataset.

In [59]:
df_Nmum.shape

(1556, 9)

_________________________________________________________________________________________________________________________________________________________________
Step 8: We will seperate the pincodes from this new dataset as we will need the pincodes later for joining them with the latitude and longitude data.

In [60]:
Pincode = df_Nmum["Pincode"]
Pincode.head()

78787    423502
78788    423101
78789    423201
78790    423208
78791    422209
Name: Pincode, dtype: int64

In [61]:
Pincode.shape

(1556,)

In [62]:
df_final1 = Pincode.drop_duplicates()
df_final1.shape

(227,)

In [63]:
df_final1.head()

78787    423502
78788    423101
78789    423201
78790    423208
78791    422209
Name: Pincode, dtype: int64

In [64]:
df_final2 = df_final1.to_frame()
df_final2.head()

Unnamed: 0,Pincode
78787,423502
78788,423101
78789,423201
78790,423208
78791,422209


In [65]:
df_final2.shape

(227, 1)

_________________________________________________________________________________________________________________________________________________________________
Step 9: We will merge the data with the pincodes so we get all the required information of the neighborhood and boroughs only for Navi Mumbai Region.

In [66]:
df_final3 = pd.merge(df_final2, df_Nmum, on='Pincode', how='left', validate="one_to_many")
df_final3.head()

Unnamed: 0,Pincode,CircleName,RegionName,DivisionName,OfficeName,OfficeType,Delivery,District,StateName
0,423502,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Abhona S.O,SO,Delivery,Jalgaon,Maharashtra
1,423502,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Bordaivat B.O,BO,Delivery,Malegaon,Maharashtra
2,423502,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Chankapur B.O,BO,Delivery,Malegaon,Maharashtra
3,423502,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Dalwat B.O,BO,Delivery,Malegaon,Maharashtra
4,423502,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Desgaon B.O,BO,Delivery,Malegaon,Maharashtra


In [67]:
df_final3.shape

(1556, 9)

_________________________________________________________________________________________________________________________________________________________________
Step 10: We will remove all the duplicate entries from our new dataset and keep just the first entry.

In [68]:
df_final4 = df_final3.drop_duplicates(subset='Pincode', keep="first")
df_final4.shape

(227, 9)

In [69]:
df_final4.head()

Unnamed: 0,Pincode,CircleName,RegionName,DivisionName,OfficeName,OfficeType,Delivery,District,StateName
0,423502,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Abhona S.O,SO,Delivery,Jalgaon,Maharashtra
12,423101,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Adgaon B.O,BO,Delivery,Jalgaon,Maharashtra
32,423201,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar BK B.O,BO,Delivery,Malegaon,Maharashtra
37,423208,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar KH B.O,BO,Delivery,Malegaon,Maharashtra
47,422209,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Ahergaon B.O,BO,Delivery,Malegaon,Maharashtra


_________________________________________________________________________________________________________________________________________________________________
Step 11: We will drop the unnecessary columns from our dataset.

In [70]:
df_final4.drop(columns=['CircleName', 'OfficeType', 'Delivery']).head()

Unnamed: 0,Pincode,RegionName,DivisionName,OfficeName,District,StateName
0,423502,Navi Mumbai Region,Malegaon Division,Abhona S.O,Jalgaon,Maharashtra
12,423101,Navi Mumbai Region,Malegaon Division,Adgaon B.O,Jalgaon,Maharashtra
32,423201,Navi Mumbai Region,Malegaon Division,Aghar BK B.O,Malegaon,Maharashtra
37,423208,Navi Mumbai Region,Malegaon Division,Aghar KH B.O,Malegaon,Maharashtra
47,422209,Navi Mumbai Region,Malegaon Division,Ahergaon B.O,Malegaon,Maharashtra


_________________________________________________________________________________________________________________________________________________________________
Step 12: We will add the columns latitude and longitude to our new dataset.

In [71]:
df_final4['latitude'] = ''

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [72]:
df_final4['longitude'] = ''

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [73]:
df_final4.head()

Unnamed: 0,Pincode,CircleName,RegionName,DivisionName,OfficeName,OfficeType,Delivery,District,StateName,latitude,longitude
0,423502,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Abhona S.O,SO,Delivery,Jalgaon,Maharashtra,,
12,423101,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Adgaon B.O,BO,Delivery,Jalgaon,Maharashtra,,
32,423201,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar BK B.O,BO,Delivery,Malegaon,Maharashtra,,
37,423208,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar KH B.O,BO,Delivery,Malegaon,Maharashtra,,
47,422209,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Ahergaon B.O,BO,Delivery,Malegaon,Maharashtra,,


_________________________________________________________________________________________________________________________________________________________________
Step 13: We will map the latitudes and longitudes with our dataset using the earlier loaded dataset of latitudes and longitudes.

In [74]:
df_final4['latitude'] = df_final4.Pincode.map(df_IN2.set_index('postalcode')['latitude'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [75]:
df_final4['longitude'] = df_final4.Pincode.map(df_IN2.set_index('postalcode')['longitude'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [76]:
df_final4.head()

Unnamed: 0,Pincode,CircleName,RegionName,DivisionName,OfficeName,OfficeType,Delivery,District,StateName,latitude,longitude
0,423502,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Abhona S.O,SO,Delivery,Jalgaon,Maharashtra,20.0947,73.9282
12,423101,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Adgaon B.O,BO,Delivery,Jalgaon,Maharashtra,20.3237,74.2071
32,423201,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar BK B.O,BO,Delivery,Malegaon,Maharashtra,20.5498,74.4557
37,423208,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar KH B.O,BO,Delivery,Malegaon,Maharashtra,20.2592,74.0714
47,422209,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Ahergaon B.O,BO,Delivery,Malegaon,Maharashtra,20.1704,73.9923


In [77]:
df_final4.shape

(227, 11)

_________________________________________________________________________________________________________________________________________________________________
Step 14: Let us check for null values in our final dataset.

In [78]:
df_final4.isnull().sum(axis = 0)

Pincode         0
CircleName      0
RegionName      0
DivisionName    0
OfficeName      0
OfficeType      0
Delivery        0
District        0
StateName       0
latitude        1
longitude       1
dtype: int64

_________________________________________________________________________________________________________________________________________________________________
Step 15: We shall drop all the null values from the latitude and longitude columns.

In [79]:
df_final4['latitude'].replace('', np.nan, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  method=method,


In [80]:
df_final4['longitude'].replace('', np.nan, inplace=True)

In [81]:
df_final4.dropna(subset=['latitude'], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [82]:
df_final4.dropna(subset=['longitude'], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [83]:
df_final4.isnull().sum(axis = 0)

Pincode         0
CircleName      0
RegionName      0
DivisionName    0
OfficeName      0
OfficeType      0
Delivery        0
District        0
StateName       0
latitude        0
longitude       0
dtype: int64

In [84]:
df_final4.head()

Unnamed: 0,Pincode,CircleName,RegionName,DivisionName,OfficeName,OfficeType,Delivery,District,StateName,latitude,longitude
0,423502,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Abhona S.O,SO,Delivery,Jalgaon,Maharashtra,20.0947,73.9282
12,423101,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Adgaon B.O,BO,Delivery,Jalgaon,Maharashtra,20.3237,74.2071
32,423201,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar BK B.O,BO,Delivery,Malegaon,Maharashtra,20.5498,74.4557
37,423208,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar KH B.O,BO,Delivery,Malegaon,Maharashtra,20.2592,74.0714
47,422209,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Ahergaon B.O,BO,Delivery,Malegaon,Maharashtra,20.1704,73.9923


In [85]:
from csv import reader
import pandas as pd

_________________________________________________________________________________________________________________________________________________________________
Our dataset is ready with all the latitudes and longitudes, so we can now check the data on folium Maps.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

In [86]:
df_final5=pd.DataFrame(df_final4)
df_final5.head()

Unnamed: 0,Pincode,CircleName,RegionName,DivisionName,OfficeName,OfficeType,Delivery,District,StateName,latitude,longitude
0,423502,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Abhona S.O,SO,Delivery,Jalgaon,Maharashtra,20.0947,73.9282
12,423101,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Adgaon B.O,BO,Delivery,Jalgaon,Maharashtra,20.3237,74.2071
32,423201,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar BK B.O,BO,Delivery,Malegaon,Maharashtra,20.5498,74.4557
37,423208,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Aghar KH B.O,BO,Delivery,Malegaon,Maharashtra,20.2592,74.0714
47,422209,Maharashtra Circle,Navi Mumbai Region,Malegaon Division,Ahergaon B.O,BO,Delivery,Malegaon,Maharashtra,20.1704,73.9923


In [87]:
df_final5.to_csv('df_final5.csv', index=False)

_________________________________________________________________________________________________________________________________________________________________
Step 16: Let us check the latitudes and longitudes of Navi Mumbai using geolocator.

In [88]:
address = 'Navi Mumbai, IN'

geolocator = Nominatim(user_agent="nm_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Navi Mumbai are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Navi Mumbai are 19.0308262, 73.0198537.


In [89]:
df_final5.dtypes

Pincode           int64
CircleName       object
RegionName       object
DivisionName     object
OfficeName       object
OfficeType       object
Delivery         object
District         object
StateName        object
latitude        float64
longitude       float64
dtype: object

_________________________________________________________________________________________________________________________________________________________________
Step 17: We will use the folium map to check the location of all the latitudes and longitudes of our dataset.

In [90]:
# create map of New York using latitude and longitude values
map_navimumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_final5['latitude'], df_final5['longitude'], df_final5['District'], df_final5['OfficeType']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_navimumbai)  
    
map_navimumbai

As we can see from the above map, our initial dataset has the pincodes of various districts within the state of Maharashtra. However, our business problem is to find suitable locations in Navi Mumbai, near to the main city of Mumbai, so that we can capitalize on the new consumers living close to the main city of Mumbai.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

_________________________________________________________________________________________________________________________________________________________________
Step 18: To solve this issue, we will filter the data even further and target the district of Thane.

In [91]:
thane_data = df_final5[df_final5['District'] == 'THANE'].reset_index(drop=True)
thane_data.head()

Unnamed: 0,Pincode,CircleName,RegionName,DivisionName,OfficeName,OfficeType,Delivery,District,StateName,latitude,longitude
0,400708,Maharashtra Circle,Navi Mumbai Region,Navi Mumbai Division,Airoli B.O,BO,Non Delivery,THANE,Maharashtra,19.151,72.9962
1,400614,Maharashtra Circle,Navi Mumbai Region,Navi Mumbai Division,Belapur Node III S.O,SO,Non Delivery,THANE,Maharashtra,19.1941,73.0002
2,400706,Maharashtra Circle,Navi Mumbai Region,Navi Mumbai Division,Darave B.O,BO,Delivery,THANE,Maharashtra,18.9894,72.961
3,400701,Maharashtra Circle,Navi Mumbai Region,Navi Mumbai Division,Ghansoli S.O,SO,Delivery,THANE,Maharashtra,19.1167,72.9833
4,400703,Maharashtra Circle,Navi Mumbai Region,Navi Mumbai Division,K.U.Bazar S.O,SO,Non Delivery,THANE,Maharashtra,19.0787,73.0005


_________________________________________________________________________________________________________________________________________________________________
Step 19: We will find the latitude and longitude of the Thane district in Navi Mumbai.

In [92]:
address = 'Thane, Navi Mumbai'

geolocator = Nominatim(user_agent="nm_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Thane are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Thane are 19.0308262, 73.0198537.


_________________________________________________________________________________________________________________________________________________________________
Step 20: We can now check the location of our new filtered dataset for the Thane District.

In [93]:
map_thane = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(thane_data['latitude'], thane_data['longitude'], thane_data['OfficeName']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_thane)  
    
map_thane

From the above folium map we can conclude that the new filtered dataset for Thane district gives us the required pincodes that are more closer to the main city of Mumbai.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

_________________________________________________________________________________________________________________________________________________________________
Step 21: We will connect the API with Foursquare with our unique client credentials.

In [94]:
CLIENT_ID = '0DFOIXJ4IRR4R25HOXKTDYI5UT3E0I0QT4XOXSNG4JUXKGUD' # your Foursquare ID
CLIENT_SECRET = 'AJQE0SIPSPD53YRLBNLSUHM1LIYFJNXRK5PCLBFHJ2OHXXD0' # your Foursquare Secret
VERSION = '20201119' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0DFOIXJ4IRR4R25HOXKTDYI5UT3E0I0QT4XOXSNG4JUXKGUD
CLIENT_SECRET:AJQE0SIPSPD53YRLBNLSUHM1LIYFJNXRK5PCLBFHJ2OHXXD0


_________________________________________________________________________________________________________________________________________________________________
Step 22: Using the above API connection we will request the 100 venues in the radius of 500m from one of the neighborhoods in the Thane district.

In [95]:
thane_data.loc[0, 'OfficeName']

'Airoli B.O'

In [96]:
neighborhood_latitude = thane_data.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = thane_data.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = thane_data.loc[0, 'OfficeName'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Airoli B.O are 19.151, 72.9962.


In [97]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=0DFOIXJ4IRR4R25HOXKTDYI5UT3E0I0QT4XOXSNG4JUXKGUD&client_secret=AJQE0SIPSPD53YRLBNLSUHM1LIYFJNXRK5PCLBFHJ2OHXXD0&v=20201119&ll=19.151,72.9962&radius=500&limit=100'

In [98]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fbbfaa65111134f6b1ad012'},
 'response': {'headerLocation': 'Thāne',
  'headerFullLocation': 'Thāne',
  'headerLocationGranularity': 'city',
  'totalResults': 8,
  'suggestedBounds': {'ne': {'lat': 19.155500004500006,
    'lng': 73.00095474290409},
   'sw': {'lat': 19.146499995499994, 'lng': 72.99144525709592}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4dd946b11f6ee146834fb625',
       'name': "Domino's Pizza",
       'location': {'address': 'Shiv Shankar Plaza 2, Shop No 6 & 7',
        'lat': 19.14807783282673,
        'lng': 72.99516113891619,
        'labeledLatLngs': [{'label': 'display',
          'lat': 19.14807783282673,
          'lng': 72.99516113891619}],
        'distance': 343,
        'postalCode': '400708',
     

In [99]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

_________________________________________________________________________________________________________________________________________________________________
Step 23: We will put the json file, received from Foursquare, along with the categories in a pandas dataframe.

In [100]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Domino's Pizza,Pizza Place,19.148078,72.995161
1,Hotel Vaibhav Sip N Dine,Hotel Bar,19.147927,72.999466
2,Café Coffee Day,Café,19.14813,72.995247
3,McDonald's,Fast Food Restaurant,19.147545,72.995163
4,Sector-9 Bus Stop,Bus Station,19.148233,72.994297


In [101]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

8 venues were returned by Foursquare.


_________________________________________________________________________________________________________________________________________________________________
Step 24: Similar to Step 22 & 23, we shall create the code to get the 100 veneues from all the neighborhoods, in a radius of 500m , in the Thane district and put it in a pandas dataframe.

In [102]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [103]:
thane_venues = getNearbyVenues(names=thane_data['OfficeName'],
                                   latitudes=thane_data['latitude'],
                                   longitudes=thane_data['longitude']
                                  )

Airoli B.O
Belapur Node   III S.O
Darave B.O
Ghansoli S.O
K.U.Bazar S.O
Kopar Khairne S.O
Millenium Business Park S.O
Abje B.O
Dongari B.O
Ghodbander B.O
Additional Ambernath S.O
Aghai B.O
Amane B.O
Ambernath S.O
Apna Bazar S.O
Atali B.O
Badlapur E.D. B.O
Balegaon B.O
Balkum S.O
Bhaji Market S.O
Bhatsanagar S.O
Bhiwandi S.O
Chamble B.O
Chitalsar Manpada B.O
Dahisar B.O
Dombivali I.A. S.O
Dombivali S.O
Dwarli B.O
Gegaon B.O
Ghodbunder Road
Gokhale Road S.O (Thane)
Jambhul B.O
Jekegram S.O
Kalwa S.O
Kalyan City H.O
Kasara S.O (Thane)
Kasegaon B.O
Khadavali B.O
Khoni B.O
Kon B.O
Kopri Colony S.O
Mamnoli B.O
O.E.Ambernath S.O
Padgha S.O
Thane Bazar S.O
Ulhasnagar 4 S.O
Ulhasnagar 5 S.O
Vidyashram S.O
Vishnunagar S.O
Wagle I.E. S.O


In [104]:
print(thane_venues.shape)
thane_venues.head()

(38, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Airoli B.O,19.151,72.9962,Domino's Pizza,19.148078,72.995161,Pizza Place
1,Airoli B.O,19.151,72.9962,Hotel Vaibhav Sip N Dine,19.147927,72.999466,Hotel Bar
2,Airoli B.O,19.151,72.9962,Café Coffee Day,19.14813,72.995247,Café
3,Airoli B.O,19.151,72.9962,McDonald's,19.147545,72.995163,Fast Food Restaurant
4,Airoli B.O,19.151,72.9962,Sector-9 Bus Stop,19.148233,72.994297,Bus Station


In [105]:
thane_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aghai B.O,1,1,1,1,1,1
Airoli B.O,8,8,8,8,8,8
Chamble B.O,1,1,1,1,1,1
Dombivali I.A. S.O,1,1,1,1,1,1
Dombivali S.O,6,6,6,6,6,6
Ghodbander B.O,9,9,9,9,9,9
K.U.Bazar S.O,4,4,4,4,4,4
Kasara S.O (Thane),3,3,3,3,3,3
Kasegaon B.O,2,2,2,2,2,2
Padgha S.O,2,2,2,2,2,2


_________________________________________________________________________________________________________________________________________________________________
Step 25: Let us check how many unique categories we have been able to fetch from the Foursquare API.

In [106]:
print('There are {} uniques categories.'.format(len(thane_venues['Venue Category'].unique())))

There are 24 uniques categories.


_________________________________________________________________________________________________________________________________________________________________
Step 26: Using one hot encoding, we shall fetch all the different categories of venues for all our neighborhoods.

In [107]:
# one hot encoding
thane_onehot = pd.get_dummies(thane_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
thane_onehot['Neighborhood'] = thane_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [thane_onehot.columns[-1]] + list(thane_onehot.columns[:-1])
thane_onehot = thane_onehot[fixed_columns]

thane_onehot.head()

Unnamed: 0,Neighborhood,ATM,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store,Fast Food Restaurant,Gym,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Lake,Multiplex,Nature Preserve,Pizza Place,Plaza,Restaurant,Sandwich Place,Snack Place,Theater,Toy / Game Store,Train Station
0,Airoli B.O,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,Airoli B.O,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Airoli B.O,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Airoli B.O,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Airoli B.O,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [108]:
thane_onehot.shape

(38, 25)

_________________________________________________________________________________________________________________________________________________________________
Step 27: We will use groupby on the neighborhood column to check the mean of the different types of locations.

In [109]:
thane_grouped = thane_onehot.groupby('Neighborhood').mean().reset_index()
thane_grouped

Unnamed: 0,Neighborhood,ATM,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store,Fast Food Restaurant,Gym,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Lake,Multiplex,Nature Preserve,Pizza Place,Plaza,Restaurant,Sandwich Place,Snack Place,Theater,Toy / Game Store,Train Station
0,Aghai B.O,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Airoli B.O,0.0,0.125,0.0,0.125,0.125,0.0,0.0,0.125,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0
2,Chamble B.O,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,Dombivali I.A. S.O,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Dombivali S.O,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0
5,Ghodbander B.O,0.0,0.0,0.111111,0.0,0.222222,0.111111,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.0
6,K.U.Bazar S.O,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0
7,Kasara S.O (Thane),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.333333
8,Kasegaon B.O,0.5,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Padgha S.O,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0


In [110]:
thane_grouped.shape

(11, 25)

_________________________________________________________________________________________________________________________________________________________________
Step 28: According to the highest frequency, we shall check the top 5 venues for all the nighborhoods.

In [111]:
num_top_venues = 5

for hood in thane_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = thane_grouped[thane_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aghai B.O----
              venue  freq
0              Lake   1.0
1               ATM   0.0
2  Asian Restaurant   0.0
3  Toy / Game Store   0.0
4           Theater   0.0


----Airoli B.O----
                  venue  freq
0      Toy / Game Store  0.12
1           Bus Station  0.12
2                  Café  0.12
3  Fast Food Restaurant  0.12
4                   Gym  0.12


----Chamble B.O----
              venue  freq
0        Restaurant   1.0
1               ATM   0.0
2  Asian Restaurant   0.0
3  Toy / Game Store   0.0
4           Theater   0.0


----Dombivali I.A. S.O----
              venue  freq
0             Hotel   1.0
1               ATM   0.0
2              Lake   0.0
3  Toy / Game Store   0.0
4           Theater   0.0


----Dombivali S.O----
                  venue  freq
0     Indian Restaurant  0.17
1                  Café  0.17
2           Snack Place  0.17
3  Fast Food Restaurant  0.17
4                   Gym  0.17


----Ghodbander B.O----
                  venue  freq
0  

_________________________________________________________________________________________________________________________________________________________________
Step 29: We shall create a code to check the top 10 venues for all neighborhhoods and put it in a pandas dataframe.

In [112]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [113]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
thane_venues_sorted = pd.DataFrame(columns=columns)
thane_venues_sorted['Neighborhood'] = thane_grouped['Neighborhood']

for ind in np.arange(thane_grouped.shape[0]):
    thane_venues_sorted.iloc[ind, 1:] = return_most_common_venues(thane_grouped.iloc[ind, :], num_top_venues)

thane_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aghai B.O,Lake,Train Station,Hotel Bar,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store,Fast Food Restaurant
1,Airoli B.O,Hotel Bar,Asian Restaurant,Bus Station,Café,Pizza Place,Fast Food Restaurant,Gym,Toy / Game Store,Train Station,Burger Joint
2,Chamble B.O,Restaurant,Train Station,Hotel Bar,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store,Fast Food Restaurant
3,Dombivali I.A. S.O,Hotel,Train Station,Toy / Game Store,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store,Fast Food Restaurant
4,Dombivali S.O,Snack Place,Café,Pizza Place,Fast Food Restaurant,Indian Restaurant,Gym,Train Station,Hotel,Asian Restaurant,Burger Joint


_________________________________________________________________________________________________________________________________________________________________
Step 30: Now we can perform cluster analysis on this new dataframe.

In [114]:
# set number of clusters
kclusters = 5

thane_grouped_clustering = thane_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(thane_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 0, 4, 1, 0, 0, 0, 4, 3, 3], dtype=int32)

In [115]:
thane_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

thane_merged = thane_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
thane_merged = thane_merged.join(thane_venues_sorted.set_index('Neighborhood'), on='OfficeName')

thane_merged.head() # check the last columns!

Unnamed: 0,Pincode,CircleName,RegionName,DivisionName,OfficeName,OfficeType,Delivery,District,StateName,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,400708,Maharashtra Circle,Navi Mumbai Region,Navi Mumbai Division,Airoli B.O,BO,Non Delivery,THANE,Maharashtra,19.151,72.9962,0.0,Hotel Bar,Asian Restaurant,Bus Station,Café,Pizza Place,Fast Food Restaurant,Gym,Toy / Game Store,Train Station,Burger Joint
1,400614,Maharashtra Circle,Navi Mumbai Region,Navi Mumbai Division,Belapur Node III S.O,SO,Non Delivery,THANE,Maharashtra,19.1941,73.0002,,,,,,,,,,,
2,400706,Maharashtra Circle,Navi Mumbai Region,Navi Mumbai Division,Darave B.O,BO,Delivery,THANE,Maharashtra,18.9894,72.961,,,,,,,,,,,
3,400701,Maharashtra Circle,Navi Mumbai Region,Navi Mumbai Division,Ghansoli S.O,SO,Delivery,THANE,Maharashtra,19.1167,72.9833,,,,,,,,,,,
4,400703,Maharashtra Circle,Navi Mumbai Region,Navi Mumbai Division,K.U.Bazar S.O,SO,Non Delivery,THANE,Maharashtra,19.0787,73.0005,0.0,Theater,Bus Station,Café,Hotel,Train Station,Hotel Bar,Asian Restaurant,Burger Joint,Chinese Restaurant,Convenience Store


In [116]:
thane_merged.to_csv('thane_merged.csv', index=False)

_________________________________________________________________________________________________________________________________________________________________
Step 31: One by one, we can check all the clusters.

In [117]:
thane_merged.loc[thane_merged['Cluster Labels'] == 0, thane_merged.columns[[1] + list(range(5, thane_merged.shape[1]))]]

Unnamed: 0,CircleName,OfficeType,Delivery,District,StateName,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Maharashtra Circle,BO,Non Delivery,THANE,Maharashtra,19.151,72.9962,0.0,Hotel Bar,Asian Restaurant,Bus Station,Café,Pizza Place,Fast Food Restaurant,Gym,Toy / Game Store,Train Station,Burger Joint
4,Maharashtra Circle,SO,Non Delivery,THANE,Maharashtra,19.0787,73.0005,0.0,Theater,Bus Station,Café,Hotel,Train Station,Hotel Bar,Asian Restaurant,Burger Joint,Chinese Restaurant,Convenience Store
9,Maharashtra Circle,BO,Non Delivery,THANE,Maharashtra,19.2836,72.8675,0.0,Café,Ice Cream Shop,Burger Joint,Sandwich Place,Pizza Place,Chinese Restaurant,Multiplex,Fast Food Restaurant,Hotel,Asian Restaurant
26,Maharashtra Circle,SO,Non Delivery,THANE,Maharashtra,19.2167,73.0833,0.0,Snack Place,Café,Pizza Place,Fast Food Restaurant,Indian Restaurant,Gym,Train Station,Hotel,Asian Restaurant,Burger Joint


In [118]:
thane_merged.loc[thane_merged['Cluster Labels'] == 1, thane_merged.columns[[1] + list(range(5, thane_merged.shape[1]))]]

Unnamed: 0,CircleName,OfficeType,Delivery,District,StateName,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Maharashtra Circle,SO,Delivery,THANE,Maharashtra,19.267,73.0715,1.0,Hotel,Train Station,Toy / Game Store,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store,Fast Food Restaurant
48,Maharashtra Circle,SO,Delivery,THANE,Maharashtra,19.267,73.0715,1.0,Hotel,Train Station,Toy / Game Store,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store,Fast Food Restaurant


In [119]:
thane_merged.loc[thane_merged['Cluster Labels'] == 2, thane_merged.columns[[1] + list(range(5, thane_merged.shape[1]))]]

Unnamed: 0,CircleName,OfficeType,Delivery,District,StateName,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Maharashtra Circle,BO,Delivery,THANE,Maharashtra,19.4994,73.3348,2.0,Lake,Train Station,Hotel Bar,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store,Fast Food Restaurant


In [120]:
thane_merged.loc[thane_merged['Cluster Labels'] == 3, thane_merged.columns[[1] + list(range(5, thane_merged.shape[1]))]]

Unnamed: 0,CircleName,OfficeType,Delivery,District,StateName,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Maharashtra Circle,BO,Delivery,THANE,Maharashtra,19.2861,73.4934,3.0,ATM,Convenience Store,Toy / Game Store,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Fast Food Restaurant,Gym
43,Maharashtra Circle,SO,Delivery,THANE,Maharashtra,19.3669,73.1758,3.0,ATM,Plaza,Hotel Bar,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store,Fast Food Restaurant


In [122]:
thane_merged.loc[thane_merged['Cluster Labels'] == 4, thane_merged.columns[[1] + list(range(5, thane_merged.shape[1]))]]

Unnamed: 0,CircleName,OfficeType,Delivery,District,StateName,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Maharashtra Circle,BO,Delivery,THANE,Maharashtra,19.7466,73.0878,4.0,Restaurant,Train Station,Hotel Bar,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store,Fast Food Restaurant
35,Maharashtra Circle,SO,Delivery,THANE,Maharashtra,19.6451,73.4743,4.0,Train Station,Restaurant,Nature Preserve,Hotel Bar,Asian Restaurant,Burger Joint,Bus Station,Café,Chinese Restaurant,Convenience Store


Looking at the above clusters we can conclude that cluster 0 has all the pincodes with the top ten venues being restaurants and hence we should be targeting one the pincodes from this cluster to start the restaurant.

Further, we can identify that the pincode 400703 has all the other type of venues in the top ten, except a Pizza Place. Hence, we can conclude that starting a Pizza place within a radius of 500 meters, of this pincode would be a good business idea.