<h1 align=center><font size = 5>Segmenting and Clustering for Cities in INDIA</font></h1>

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in New York City</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

Before we get the data and start exploring it,  downloaded all the dependencies that we will need.

In [30]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests# library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.21.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

## 1. Download and Explore Dataset

India have 25 states ( year 2011) and 431 Districts. In order to segement the Districts and explore them, we will essentially need a dataset that contains the states and the districts that exist in each state as well as the the latitude and logitude coordinates of each districts centroid. 

This dataset exists in kaggle. Dataset can be found in https://www.kaggle.com/sirpunch/indian-census-data-with-geospatial-indexing#district%20wise%20population%20and%20centroids.csv.
However , the data for the project is already downloaded and saved in Github

#### Load and explore the data

In [31]:
import pandas as pd
india=pd.read_csv('https://raw.githubusercontent.com/AmitLinde/Coursera_Capstone/master/Cities.csv')
print(india.shape)
india.head()

(431, 6)


Unnamed: 0,State,District,Latitude,Longitude,Population in 2001,Population in 2011
0,Andhra Pradesh,Anantapur,14.312066,77.460158,3640478,4081148
1,Andhra Pradesh,Chittoor,13.331093,78.927639,3745875,4174064
2,Andhra Pradesh,East Godavari,16.782718,82.243207,4901420,5154296
3,Andhra Pradesh,Guntur,15.884926,80.586576,4465144,4887813
4,Andhra Pradesh,Krishna,16.143873,81.148051,4187841,4517398


In [32]:
print('The dataframe has {} states and {} districts.'.format(
len(india['State'].unique()),
india.shape[0]
)
)

The dataframe has 25 states and 431 districts.


### Select districs with most populations

In [33]:
districts = india[['State', 'District', 'Latitude', 'Longitude','Population in 2011']].sort_values(by=['Population in 2011'],ascending=False).head(20).reset_index()
districts

Unnamed: 0,index,State,District,Latitude,Longitude,Population in 2011
0,261,Maharashtra,Thane,19.698768,72.798827,11060148
1,255,Maharashtra,Pune,18.516962,74.129229,9429408
2,428,West Bengal,Murshidabad,24.259507,88.168169,7103807
3,322,Rajasthan,Jaipur,27.033459,75.771173,6626178
4,252,Maharashtra,Nashik,20.266913,74.038242,6107187
5,108,Gujarat,Surat,21.257614,72.934887,6081322
6,363,Uttar Pradesh,Allahabad,25.392039,82.051724,5954391
7,65,Bihar,Patna,25.402971,85.319843,5838465
8,429,West Bengal,Nadia,23.56411,88.58293,5167600
9,2,Andhra Pradesh,East Godavari,16.782718,82.243207,5154296


In [34]:
print('The dataframe has {} states and {} districts.'.format(
len(districts['State'].unique()),
districts.shape[0]
)
)

The dataframe has 9 states and 20 districts.


####  Use geopy library to get the latitude and longitude values of India.

In [35]:
address = 'India'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate for centroid of India are {}, {}.'.format(latitude, longitude
))

The geograpical coordinate for centroid of India are 22.3511148, 78.6677428.


# create map of India using latitude and longitude values
map_india = folium.Map(location=[latitude, longitude], zoom_start=10)
# add markers to map
for lat, lng, State, District in zip(india['Latitude'], india['Longitude'], india['State'], india['District']):
    label = '{}, {}'.format(District, State)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_india)
map_india

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [49]:
CLIENT_ID = 'HT4J1D0RWL52AV4QMEJ2DE0A0L1NWSUYRIVURPOTOXY04OAY' # your Foursquare ID
CLIENT_SECRET = 'J5QCUFEICHUV5NFJVKI1STS2M1WXSQOVJ3CBQR3KMI0CY0FM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HT4J1D0RWL52AV4QMEJ2DE0A0L1NWSUYRIVURPOTOXY04OAY
CLIENT_SECRET:J5QCUFEICHUV5NFJVKI1STS2M1WXSQOVJ3CBQR3KMI0CY0FM


#### Let's explore the first district in our dataframe.
Get the district's name.

In [50]:
districts.loc[0, 'District']

'Thane'

Get the Districts's latitude and longitude values.

In [51]:
neighborhood_latitude = districts.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = districts.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = districts.loc[0, 'District'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Thane are 19.698767747440268, 72.79882742564581.


#### Now, let's get the top 25 venues that are within a radius of 50,000 meters of above co-ordinates

First, let's create the GET request URL. Name your URL **url**.

In [74]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    'HT4J1D0RWL52AV4QMEJ2DE0A0L1NWSUYRIVURPOTOXY04OAY', 
    'J5QCUFEICHUV5NFJVKI1STS2M1WXSQOVJ3CBQR3KMI0CY0FM', 
    20200101, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    50000, 
    25)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=HT4J1D0RWL52AV4QMEJ2DE0A0L1NWSUYRIVURPOTOXY04OAY&client_secret=J5QCUFEICHUV5NFJVKI1STS2M1WXSQOVJ3CBQR3KMI0CY0FM&v=20200101&ll=19.698767747440268,72.79882742564581&radius=50000&limit=25'

Send the GET request and examine the resutls

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [75]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [76]:
import requests

results = requests.get(url).json()
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.shape

(21, 4)

In [77]:
nearby_venues

Unnamed: 0,name,categories,lat,lng
0,NATURALS ICE-CREAM,Ice Cream Shop,19.447924,72.801314
1,Bhajansons Dairy Farm,Restaurant,19.380189,72.898666
2,Celebrity Restaurant,Indian Restaurant,19.399813,72.8406
3,Hotel Anam,Indian Restaurant,19.70739,72.921284
4,Silent Hills Resort,Pool,19.709589,72.932746
5,Shri Datta Snacks,Breakfast Spot,19.684512,72.908546
6,Pearline,Restaurant,19.991458,72.723237
7,True Taste,Restaurant,19.377543,72.827801
8,Kelwa Beach,Beach,19.613246,72.731072
9,Subway,Sandwich Place,19.258144,72.853034


In [78]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

21 venues were returned by Foursquare.


## 2.  Explore Neighborhoods in Indian States

In [None]:
#### Let's create a function to repeat the same process to all the neighborhoods 

In [79]:
def getNearbyVenues(names, latitudes, longitudes, radius=50000,LIMIT=25):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            'HT4J1D0RWL52AV4QMEJ2DE0A0L1NWSUYRIVURPOTOXY04OAY', 
            'J5QCUFEICHUV5NFJVKI1STS2M1WXSQOVJ3CBQR3KMI0CY0FM', 
            20200101, 
            lat, 
            lng, 
            50000, 
            25)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now  the code to run the above function on each District and create a new dataframe called *india_venues*.

In [80]:
india_venues = getNearbyVenues(names=india['District'],
                                   latitudes=india['Latitude'],
                                   longitudes=india['Longitude']
                                )    

Anantapur
Chittoor
East Godavari
Guntur
Krishna
Kurnool
Prakasam
Srikakulam
Vizianagaram
West Godavari
Changlang
East Kameng
East Siang
Kurung Kumey
Lohit
Lower Dibang Valley
Lower Subansiri
Papum Pare
Tawang
Tirap
Upper Siang
Upper Subansiri
West Kameng
West Siang
Barpeta
Bongaigaon
Cachar
Darrang
Dhemaji
Dibrugarh
Goalpara
Golaghat
Hailakandi
Jorhat
Kamrup
Karbi Anglong
Karimganj
Kokrajhar
Lakhimpur
Nagaon
Nalbari
Sonitpur
Tinsukia
Araria
Aurangabad
Banka
Begusarai
Bhagalpur
Bhojpur
Buxar
Darbhanga
Gaya
Gopalganj
Jamui
Jehanabad
Katihar
Khagaria
Kishanganj
Lakhisarai
Madhepura
Madhubani
Munger
Muzaffarpur
Nalanda
Nawada
Patna
Purnia
Rohtas
Saharsa
Samastipur
Saran
Sheikhpura
Sheohar
Sitamarhi
Siwan
Supaul
Vaishali
Chandigarh
Bastar
Bilaspur
Dhamtari
Durg
Janjgir-Champa
Jashpur
Korba
Koriya
Mahasamund
Raigarh
Raipur
Surguja
North Goa
South Goa
Amreli
Anand
Banas Kantha
Bharuch
Bhavnagar
Gandhinagar
Jamnagar
Junagadh
Kachchh
Kheda
Narmada
Navsari
Patan
Porbandar
Rajkot
Sabar Kantha
Sur

In [81]:
india_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Anantapur,14.312066,77.460158,Hanuman Hillrock Cafe,14.165056,77.812683,Vegetarian / Vegan Restaurant
1,Anantapur,14.312066,77.460158,Western Canteen ( Foreign Canteen ),14.16592,77.808753,Vegetarian / Vegan Restaurant
2,Anantapur,14.312066,77.460158,BIG Cinemas,14.681915,77.603515,Movie Theater
3,Anantapur,14.312066,77.460158,Sri Sagar,14.67861,77.601763,Vegetarian / Vegan Restaurant
4,Anantapur,14.312066,77.460158,Hotel Masineni Grand,14.678782,77.602078,Hotel


In [84]:
india_venues.shape

(4669, 7)

In [85]:
india_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agra,25,25,25,25,25,25
Aizawl,5,5,5,5,5,5
Ajmer,25,25,25,25,25,25
Akola,10,10,10,10,10,10
Alappuzha,25,25,25,25,25,25
Aligarh,15,15,15,15,15,15
Allahabad,19,19,19,19,19,19
Alwar,6,6,6,6,6,6
Ambala,25,25,25,25,25,25
Ambedkar Nagar,4,4,4,4,4,4


#### Let's find out how many unique categories can be curated from all the returned venues

In [87]:
print('There are {} uniques categories.'.format(len(india_venues['Venue Category'].unique())))

There are 250 uniques categories.


## 3. Analyze Each Neighborhood

In [89]:
# one hot encoding
india_onehot = pd.get_dummies(india_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
india_onehot['Neighborhood'] = india_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [india_onehot.columns[-1]] + list(india_onehot.columns[:-1])
india_onehot = india_onehot[fixed_columns]

india_onehot.head()

Unnamed: 0,Zoo,ATM,Afghan Restaurant,Airport,Airport Food Court,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Auto Garage,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Beach,Bed & Breakfast,Bengali Restaurant,Big Box Store,Bike Shop,Bistro,Boarding House,Boat or Ferry,Bookstore,Border Crossing,Botanical Garden,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buddhist Temple,Buffet,Burger Joint,Bus Station,Business Service,Cafeteria,Café,Campground,Canal Lock,Cantonese Restaurant,Casino,Castle,Cave,Child Care Service,Chinese Restaurant,Chocolate Shop,City,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Country Dance Club,Cricket Ground,Cupcake Shop,Currency Exchange,Dairy Store,Department Store,Dessert Shop,Dhaba,Dim Sum Restaurant,Diner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,General Travel,German Restaurant,Gift Shop,Golf Course,Greek Restaurant,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Hindu Temple,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Spring,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Indonesian Restaurant,Island,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Kerala Restaurant,Kids Store,Korean Restaurant,Lake,Library,Light Rail Station,Lighthouse,Lighting Store,Lounge,Maharashtrian Restaurant,Market,Mattress Store,Medical Supply Store,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,National Park,Nature Preserve,Neighborhood,Nightclub,Noodle House,North Indian Restaurant,Office,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoors & Recreation,Pakistani Restaurant,Palace,Park,Performing Arts Venue,Pharmacy,Pier,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Pool Hall,Pub,Racetrack,Rajasthani Restaurant,Rental Car Location,Reservoir,Resort,Rest Area,Restaurant,River,Road,Roof Deck,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Shrine,Sikh Temple,Ski Area,Ski Chairlift,Ski Lodge,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Spiritual Center,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Summer Camp,Supermarket,Surf Spot,Sushi Restaurant,Swiss Restaurant,Tea Room,Temple,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Tibetan Restaurant,Toll Booth,Tour Provider,Town,Toy / Game Store,Track Stadium,Trail,Train Station,Travel & Transport,Vacation Rental,Vegetarian / Vegan Restaurant,Video Store,Village,Vineyard,Waterfall,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [90]:
india_onehot.shape

(4669, 250)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [91]:
india_grouped = india_onehot.groupby('Neighborhood').mean().reset_index()
india_grouped

Unnamed: 0,Neighborhood,Zoo,ATM,Afghan Restaurant,Airport,Airport Food Court,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Auto Garage,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Beach,Bed & Breakfast,Bengali Restaurant,Big Box Store,Bike Shop,Bistro,Boarding House,Boat or Ferry,Bookstore,Border Crossing,Botanical Garden,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buddhist Temple,Buffet,Burger Joint,Bus Station,Business Service,Cafeteria,Café,Campground,Canal Lock,Cantonese Restaurant,Casino,Castle,Cave,Child Care Service,Chinese Restaurant,Chocolate Shop,City,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Country Dance Club,Cricket Ground,Cupcake Shop,Currency Exchange,Dairy Store,Department Store,Dessert Shop,Dhaba,Dim Sum Restaurant,Diner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,General Travel,German Restaurant,Gift Shop,Golf Course,Greek Restaurant,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Hindu Temple,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Spring,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Indonesian Restaurant,Island,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Kerala Restaurant,Kids Store,Korean Restaurant,Lake,Library,Light Rail Station,Lighthouse,Lighting Store,Lounge,Maharashtrian Restaurant,Market,Mattress Store,Medical Supply Store,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,National Park,Nature Preserve,Nightclub,Noodle House,North Indian Restaurant,Office,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoors & Recreation,Pakistani Restaurant,Palace,Park,Performing Arts Venue,Pharmacy,Pier,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Pool Hall,Pub,Racetrack,Rajasthani Restaurant,Rental Car Location,Reservoir,Resort,Rest Area,Restaurant,River,Road,Roof Deck,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Shrine,Sikh Temple,Ski Area,Ski Chairlift,Ski Lodge,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Spiritual Center,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Summer Camp,Supermarket,Surf Spot,Sushi Restaurant,Swiss Restaurant,Tea Room,Temple,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Tibetan Restaurant,Toll Booth,Tour Provider,Town,Toy / Game Store,Track Stadium,Trail,Train Station,Travel & Transport,Vacation Rental,Vegetarian / Vegan Restaurant,Video Store,Village,Vineyard,Waterfall,Yoga Studio
0,Agra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.28,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aizawl,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ajmer,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0
3,Akola,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alappuzha,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.12,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.44,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Aligarh,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Allahabad,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Alwar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Ambala,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.04,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Ambedkar Nagar,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [92]:
## lets confirm the new size
india_grouped.shape

(422, 250)

#### Let's print each neighborhood along with the top 5 most common venues

In [95]:
num_top_venues = 5

for hood in india_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = india_grouped[india_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agra----
                            venue  freq
0                           Hotel  0.28
1               Indian Restaurant  0.20
2                   Historic Site  0.12
3                          Resort  0.08
4  Multicuisine Indian Restaurant  0.08


----Aizawl----
           venue  freq
0          Hotel   0.4
1  Shopping Mall   0.2
2           Park   0.2
3     Soup Place   0.2
4         Office   0.0


----Ajmer----
               venue  freq
0               Café  0.16
1  Indian Restaurant  0.16
2              Hotel  0.16
3        Pizza Place  0.12
4               Lake  0.08


----Akola----
               venue  freq
0  Mobile Phone Shop   0.7
1                ATM   0.2
2       Dessert Shop   0.1
3             Palace   0.0
4      National Park   0.0


----Alappuzha----
               venue  freq
0             Resort  0.44
1  Indian Restaurant  0.12
2              Hotel  0.08
3               Lake  0.04
4       Dessert Shop  0.04


----Aligarh----
                     venue  freq
0  

#### Let's put that into a *pandas* dataframe
First, let's write a function to sort the venues in descending order.

In [96]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [100]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = india_grouped['Neighborhood']

for ind in np.arange(india_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(india_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agra,Hotel,Indian Restaurant,Historic Site,Fast Food Restaurant,Multicuisine Indian Restaurant,Resort,Garden,Fried Chicken Joint,Pool,Bed & Breakfast
1,Aizawl,Hotel,Shopping Mall,Soup Place,Park,Food,Fast Food Restaurant,Field,Flea Market,Fondue Restaurant,Yoga Studio
2,Ajmer,Indian Restaurant,Café,Hotel,Pizza Place,Vegetarian / Vegan Restaurant,Lake,Fast Food Restaurant,City,Bakery,Resort
3,Akola,Mobile Phone Shop,ATM,Dessert Shop,Yoga Studio,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant
4,Alappuzha,Resort,Indian Restaurant,Hotel,Lake,Café,General Travel,Restaurant,Lighthouse,Beach,Scenic Lookout


## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [102]:
# set number of clusters
kclusters = 5

india_grouped_clustering = india_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(india_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 2, 3, 1, 3, 2, 2, 3, 2, 4], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [125]:
# add clustering labels
# neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

india_merged = india[['State','District','Latitude','Longitude']]

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
india_merged = india_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='District')

india_merged.head() # check the last columns

Unnamed: 0,State,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Andhra Pradesh,Anantapur,14.312066,77.460158,2.0,Vegetarian / Vegan Restaurant,Movie Theater,Hotel,Yoga Studio,Food Court,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Forest
1,Andhra Pradesh,Chittoor,13.331093,78.927639,3.0,Indian Restaurant,Café,Pizza Place,Historic Site,Hotel,Snack Place,Burger Joint,Bus Station,Chinese Restaurant,South Indian Restaurant
2,Andhra Pradesh,East Godavari,16.782718,82.243207,3.0,Multiplex,Indian Restaurant,Pizza Place,Fried Chicken Joint,Bakery,Lounge,Asian Restaurant,Hotel,Coffee Shop,Department Store
3,Andhra Pradesh,Guntur,15.884926,80.586576,3.0,Ice Cream Shop,Hotel,Indian Restaurant,Movie Theater,Beach,Indie Movie Theater,Market,Train Station,Flea Market,Fondue Restaurant
4,Andhra Pradesh,Krishna,16.143873,81.148051,3.0,Beach,Indian Restaurant,Movie Theater,Cosmetics Shop,Yoga Studio,Forest,Food,Food & Drink Shop,Food Court,Food Truck


## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

## Cluster1

In [127]:
india_merged.loc[india_merged['Cluster Labels'] == 0, india_merged.columns[[1] + list(range(5, india_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Upper Subansiri,ATM,Yoga Studio,Fast Food Restaurant,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck
23,West Siang,ATM,Yoga Studio,Fast Food Restaurant,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck
25,Bongaigaon,ATM,Train Station,Diner,Forest,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,Yoga Studio
26,Cachar,ATM,Airport Terminal,Shopping Mall,Food Truck,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Yoga Studio
41,Sonitpur,ATM,Indian Restaurant,Zoo,Lighting Store,Mobile Phone Shop,Bank,National Park,Fried Chicken Joint,French Restaurant,Flea Market
45,Banka,ATM,Pizza Place,Hotel,Train Station,Ice Cream Shop,Yoga Studio,Field,Flea Market,Fondue Restaurant,Food
50,Darbhanga,ATM,Shopping Mall,Motorcycle Shop,Yoga Studio,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck
58,Lakhisarai,ATM,Train Station,Platform,Clothing Store,Yoga Studio,Food Truck,Flea Market,Fondue Restaurant,Food,Food & Drink Shop
59,Madhepura,ATM,Yoga Studio,Fast Food Restaurant,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck
60,Madhubani,ATM,Airport,Train Station,Indian Restaurant,Yoga Studio,Food Truck,Flea Market,Fondue Restaurant,Food,Food & Drink Shop


## Cluster 2

In [128]:
india_merged.loc[india_merged['Cluster Labels'] == 1, india_merged.columns[[1] + list(range(5, india_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
233,Sidhi,Mobile Phone Shop,ATM,Scenic Lookout,Yoga Studio,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck
238,Akola,Mobile Phone Shop,ATM,Dessert Shop,Yoga Studio,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant
243,Dhule,ATM,Breakfast Spot,Fast Food Restaurant,Mobile Phone Shop,Yoga Studio,Forest,Food,Food & Drink Shop,Food Court,Food Truck
244,Hingoli,Mobile Phone Shop,ATM,Indian Restaurant,Forest,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,Yoga Studio
246,Jalna,Mobile Phone Shop,ATM,Hotel,Train Station,Yoga Studio,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Food Court
251,Nandurbar,ATM,Mobile Phone Shop,Café,Yoga Studio,Flea Market,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant
253,Osmanabad,Mobile Phone Shop,Train Station,Yoga Studio,Fast Food Restaurant,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck
254,Parbhani,Mobile Phone Shop,Train Station,ATM,Platform,Yoga Studio,Food Truck,Flea Market,Fondue Restaurant,Food,Food & Drink Shop
260,Solapur,Mobile Phone Shop,ATM,Maharashtrian Restaurant,Train Station,Yoga Studio,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck
263,Washim,Mobile Phone Shop,Indian Restaurant,Yoga Studio,Field,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest


## Cluster 3

In [129]:
india_merged.loc[india_merged['Cluster Labels'] == 2, india_merged.columns[[1] + list(range(5, india_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Anantapur,Vegetarian / Vegan Restaurant,Movie Theater,Hotel,Yoga Studio,Food Court,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Forest
6,Prakasam,Food Court,Hotel,Arts & Crafts Store,French Restaurant,Yoga Studio,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Food Truck
10,Changlang,National Park,Yoga Studio,Field,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck
12,East Siang,Cave,Yoga Studio,Fast Food Restaurant,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck
14,Lohit,Movie Theater,Soccer Stadium,Snack Place,Yoga Studio,Garden,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck
15,Lower Dibang Valley,Campground,Yoga Studio,Field,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck
16,Lower Subansiri,Campground,Yoga Studio,Field,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck
17,Papum Pare,Ice Cream Shop,Trail,River,Campground,Fast Food Restaurant,Forest,Food Truck,French Restaurant,Fried Chicken Joint,Farmers Market
18,Tawang,Mountain,Lake,Yoga Studio,Field,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest
19,Tirap,Scenic Lookout,Vacation Rental,Flea Market,Bar,Train Station,Trail,Airport,French Restaurant,Forest,Farmers Market


## Cluster 4

In [130]:
india_merged.loc[india_merged['Cluster Labels'] == 3, india_merged.columns[[1] + list(range(5, india_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chittoor,Indian Restaurant,Café,Pizza Place,Historic Site,Hotel,Snack Place,Burger Joint,Bus Station,Chinese Restaurant,South Indian Restaurant
2,East Godavari,Multiplex,Indian Restaurant,Pizza Place,Fried Chicken Joint,Bakery,Lounge,Asian Restaurant,Hotel,Coffee Shop,Department Store
3,Guntur,Ice Cream Shop,Hotel,Indian Restaurant,Movie Theater,Beach,Indie Movie Theater,Market,Train Station,Flea Market,Fondue Restaurant
4,Krishna,Beach,Indian Restaurant,Movie Theater,Cosmetics Shop,Yoga Studio,Forest,Food,Food & Drink Shop,Food Court,Food Truck
8,Vizianagaram,Indian Restaurant,Train Station,Pizza Place,Food Court,Field,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Yoga Studio
68,Saharsa,Platform,Convenience Store,Train Station,Indian Restaurant,Historic Site,French Restaurant,Fried Chicken Joint,Forest,Furniture / Home Store,Farmers Market
79,Bilaspur,Indian Restaurant,Café,Jewelry Store,Travel & Transport,Market,Pizza Place,Resort,Hotel,Reservoir,Museum
80,Dhamtari,Indian Restaurant,ATM,Motorcycle Shop,Beach,Food Court,Forest,Food,Food & Drink Shop,Food Truck,Yoga Studio
94,Banas Kantha,Indian Restaurant,Memorial Site,Hotel,Food Court,Field,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Yoga Studio
100,Kachchh,Indian Restaurant,Historic Site,Shopping Mall,Farm,Castle,Harbor / Marina,Food,Field,Flea Market,Fondue Restaurant


## Cluster 5

In [131]:
india_merged.loc[india_merged['Cluster Labels'] == 4, india_merged.columns[[1] + list(range(5, india_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Kurnool,Train Station,Multiplex,Shopping Mall,Indian Restaurant,Asian Restaurant,Hotel,Yoga Studio,Food & Drink Shop,Flea Market,Fondue Restaurant
7,Srikakulam,Train Station,Pharmacy,ATM,Dhaba,Indian Restaurant,Yoga Studio,Food Court,Flea Market,Fondue Restaurant,Food
9,West Godavari,Train Station,Bakery,Multiplex,Shopping Mall,Asian Restaurant,Café,Yoga Studio,Food Court,Food,Food & Drink Shop
24,Barpeta,Train Station,Gastropub,Hotel,Farm,Dessert Shop,Dhaba,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant
30,Goalpara,Train Station,Hotel,Bus Station,Food Court,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Yoga Studio,Fast Food Restaurant
35,Karbi Anglong,Train Station,Restaurant,ATM,Optical Shop,Indian Restaurant,Mobile Phone Shop,Food Court,Flea Market,Fondue Restaurant,Food
37,Kokrajhar,Train Station,Farm,Park,Mobile Phone Shop,ATM,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Gaming Cafe
39,Nagaon,Train Station,National Park,ATM,Indian Restaurant,Optical Shop,Motorcycle Shop,Café,Mobile Phone Shop,Food Truck,Food
43,Araria,Auto Garage,ATM,Bar,Train Station,Forest,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck
46,Begusarai,Train Station,Yoga Studio,Fast Food Restaurant,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Forest,Food Truck


In [133]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(india_merged['Latitude'], india_merged['Longitude'], india_merged['State'], india_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=1,
        #color=rainbow[cluster - 1],
        fill=True,
        fill_color=1,
        #fill_color=rainbow[cluster-1],        
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters