# Battle of Neighborhoods Final Assignment

<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Chennai</font></h1>

## Introduction

In this notebook, I will be using the Foursquare API to explore neighborhoods in Chennai. I will also use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. I will use the *k*-means clustering algorithm to complete this task. Finally, I will use the Folium library to visualize the neighborhoods in Chennai City and their emerging clusters.

Before getting started with the data and start exploring it, I'll download all the dependencies that are needed.

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

import json # library to handle JSON files

# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
print('Libraries imported.')

Libraries imported.


## 1. Download and Explore Dataset

The data needed for this assignment is read, cleaned and relevant fields are stored in the form of csv file using the directions in another file *Data Cleaning.ipynb*. That notebook is run first according to the necessary parameters and the output in the form of csv file is used here.

In [2]:
df=pd.read_csv('Chennai.csv',index_col=0)
df

Unnamed: 0,latitude,longitude,country code,postal code,place name,admin name1,admin name2
0,12.9194,80.1697,IN,600078,Kalaignar Karunanidhi Nagar,Tamil Nadu,Chennai
1,12.9675,80.2598,IN,600041,Valmiki Nagar,Tamil Nadu,Chennai
2,12.9855,80.2604,IN,600041,Tiruvanmiyur,Tamil Nadu,Chennai
3,13.0156,80.2467,IN,600085,Kotturpuram,Tamil Nadu,Chennai
4,13.025,80.2575,IN,600028,"Raja Annamalaipuram,Ramakrishna Nagar (Chennai)",Tamil Nadu,Chennai
5,13.0269,80.2406,IN,600035,Nandanam,Tamil Nadu,Chennai
6,13.0292,80.2708,IN,600004,"Mandaveli,Mylapore,Vivekananda College Madras",Tamil Nadu,Chennai
7,13.038,80.2301,IN,600113,"TTTI Taramani,Tidel Park",Tamil Nadu,Chennai
8,13.0389,80.2258,IN,600033,"Mambalam R.S.,West Mambalam",Tamil Nadu,Chennai
9,13.0433,80.2528,IN,600018,"Pr. Accountant General,Teynampet",Tamil Nadu,Chennai


Let's get the geographical coordinates of Chennai.

In [3]:
address = 'Chennai, IN'

geolocator = Nominatim(user_agent="in_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address,latitude, longitude))

The geograpical coordinate of Chennai, IN are 13.0836939, 80.270186.


In [4]:
# create map of Chennai using latitude and longitude values
map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df['latitude'], df['longitude'], df['place name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  

map

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [1]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20210129' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


#### Let's explore the first Place in our dataframe.

Get the place's name.

In [6]:
df.loc[0, 'place name']

'Kalaignar Karunanidhi Nagar'

In [7]:
neighborhood_latitude = df.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = df.loc[0, 'place name'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Kalaignar Karunanidhi Nagar are 12.9194, 80.1697.


#### Now, let's get the top 100 venues that are in the venue within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [8]:
# type your answer here
LIMIT = 100
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

Send the GET request and examine the resutls

In [9]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6014fa86f0a2d3256374c30c'},
 'response': {'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 7,
  'suggestedBounds': {'ne': {'lat': 12.923900004500004,
    'lng': 80.1743082580065},
   'sw': {'lat': 12.914899995499995, 'lng': 80.1650917419935}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ec91b1b61af9e1431a70bb5',
       'name': 'Jayalakhsmi Bakery',
       'location': {'lat': 12.918770629736093,
        'lng': 80.1712167263031,
        'labeledLatLngs': [{'label': 'display',
          'lat': 12.918770629736093,
          'lng': 80.1712167263031}],
        'distance': 178,
        'cc': 'IN',
        'country': 'India',
        'formattedAd

From the Foursquare lab module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [10]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [11]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Jayalakhsmi Bakery,Bakery,12.918771,80.171217
1,windows reseller hosting,IT Services,12.919952,80.168889
2,Namma Aachi,Indian Restaurant,12.91943,80.171652
3,Alif Restaurant,Indian Restaurant,12.919569,80.17175
4,mcRennet,Bakery,12.92116,80.168175


In [12]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

7 venues were returned by Foursquare.


## 2. Explore Neighborhoods in Chennai

#### Let's create a function to repeat the same process to all the neighborhoods in Chennai

In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now we write the code to run the above function on each neighborhood and create a new dataframe called *chennai_venues*.

In [14]:

location_venues = getNearbyVenues(names=df['place name'],
                                   latitudes=df['latitude'],
                                   longitudes=df['longitude']
                                  )

Kalaignar Karunanidhi Nagar
Valmiki Nagar
Tiruvanmiyur
Kotturpuram
Raja Annamalaipuram,Ramakrishna Nagar (Chennai)
Nandanam
Mandaveli,Mylapore,Vivekananda College Madras
TTTI Taramani,Tidel Park
Mambalam R.S.,West Mambalam
Pr. Accountant General,Teynampet
Koyambedu Wholesale Market Com,Virugambakkam
Gopalapuram (Chennai)
Saligramam
Kodambakkam
Lloyds Estate,Pudupakkam,Royapettah,Triplicane South
Vadapalani
Tiruvallikkeni
Madras University
Chepauk
Loyola College,Nungambakkam,Nungambakkam High Road
Adyar (Chennai),Aynavaram,Engineering College (Chennai),Kasturibai Nagar,Rajbhavan (Chennai),Shastri Nagar (Chennai)
Ekkaduthangal,Guindy Industrial Estate,Icf Colony,Indian Institute Of Technology
Ashoknagar (Chennai),Flowers Road,Jafferkhanpet
Chetput,World University Centre
Fort St George,Greams Road,Old College Buildings,Shastri Bhavan,Teynampet West
Egmore,Egmore ND,Ethiraj Salai
Koyambedu
Anna Nagar East,Anna Nagar Western Extn,High Court Building (Chennai)
Arumbakkam,D G Vaishnav Colleg

In [15]:
print(location_venues.shape)
location_venues.head()

(439, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Kalaignar Karunanidhi Nagar,12.9194,80.1697,Jayalakhsmi Bakery,12.918771,80.171217,Bakery
1,Kalaignar Karunanidhi Nagar,12.9194,80.1697,windows reseller hosting,12.919952,80.168889,IT Services
2,Kalaignar Karunanidhi Nagar,12.9194,80.1697,Namma Aachi,12.91943,80.171652,Indian Restaurant
3,Kalaignar Karunanidhi Nagar,12.9194,80.1697,Alif Restaurant,12.919569,80.17175,Indian Restaurant
4,Kalaignar Karunanidhi Nagar,12.9194,80.1697,mcRennet,12.92116,80.168175,Bakery


In [16]:
location_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adyar (Chennai),Aynavaram,Engineering College (Chennai),Kasturibai Nagar,Rajbhavan (Chennai),Shastri Nagar (Chennai)",16,16,16,16,16,16
Agaram,5,5,5,5,5,5
Aminjikarai,13,13,13,13,13,13
"Anna Nagar (Chennai),Velacheri",1,1,1,1,1,1
"Anna Nagar East,Anna Nagar Western Extn,High Court Building (Chennai)",8,8,8,8,8,8
"Arumbakkam,D G Vaishnav College",8,8,8,8,8,8
"Ashoknagar (Chennai),Flowers Road,Jafferkhanpet",14,14,14,14,14,14
"Chennai G.P.O.,Flower Bazaar,Govt Stanley Hospital,MPT AO,Mint Building",1,1,1,1,1,1
Chepauk,8,8,8,8,8,8
"Chetput,World University Centre",19,19,19,19,19,19


#### Let's find out how many unique categories can be curated from all the returned venues

In [17]:
print('There are {} uniques categories.'.format(len(location_venues['Venue Category'].unique())))

There are 102 uniques categories.


## 3. Analyze Each Neighborhood

In [18]:
# one hot encoding
location_onehot = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
location_onehot['Neighborhood'] = location_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [location_onehot.columns[-1]] + list(location_onehot.columns[:-1])
location_onehot = location_onehot[fixed_columns]

location_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,...,Stadium,Tea Room,Thai Restaurant,Theater,Train,Train Station,Vegetarian / Vegan Restaurant,Warehouse Store,Women's Store,Yoga Studio
0,Kalaignar Karunanidhi Nagar,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,Kalaignar Karunanidhi Nagar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Kalaignar Karunanidhi Nagar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Kalaignar Karunanidhi Nagar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Kalaignar Karunanidhi Nagar,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
location_onehot.shape

(439, 103)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [20]:
location_grouped = location_onehot.groupby('Neighborhood').mean().reset_index()
location_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,...,Stadium,Tea Room,Thai Restaurant,Theater,Train,Train Station,Vegetarian / Vegan Restaurant,Warehouse Store,Women's Store,Yoga Studio
0,"Adyar (Chennai),Aynavaram,Engineering College ...",0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agaram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aminjikarai,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Anna Nagar (Chennai),Velacheri",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Anna Nagar East,Anna Nagar Western Extn,High C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
5,"Arumbakkam,D G Vaishnav College",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
6,"Ashoknagar (Chennai),Flowers Road,Jafferkhanpet",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Chennai G.P.O.,Flower Bazaar,Govt Stanley Hosp...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Chepauk,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0
9,"Chetput,World University Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [21]:
location_grouped.shape

(48, 103)

#### Let's print each neighborhood along with the top 5 most common venues

In [22]:
num_top_venues = 5

for hood in location_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = location_grouped[location_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adyar (Chennai),Aynavaram,Engineering College (Chennai),Kasturibai Nagar,Rajbhavan (Chennai),Shastri Nagar (Chennai)----
                  venue  freq
0     Indian Restaurant  0.31
1   Japanese Restaurant  0.06
2  Fast Food Restaurant  0.06
3           Coffee Shop  0.06
4    Chinese Restaurant  0.06


----Agaram----
                 venue  freq
0            Multiplex   0.4
1  Indie Movie Theater   0.2
2       Clothing Store   0.2
3                  Gym   0.2
4    Afghan Restaurant   0.0


----Aminjikarai----
                  venue  freq
0  Fast Food Restaurant  0.23
1           Pizza Place  0.15
2             Multiplex  0.08
3             Bookstore  0.08
4        Clothing Store  0.08


----Anna Nagar (Chennai),Velacheri----
               venue  freq
0   Business Service   1.0
1  Afghan Restaurant   0.0
2                Pub   0.0
3           Platform   0.0
4        Pizza Place   0.0


----Anna Nagar East,Anna Nagar Western Extn,High Court Building (Chennai)----
                   

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [23]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each Place.

In [24]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = location_grouped['Neighborhood']

for ind in np.arange(location_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(location_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adyar (Chennai),Aynavaram,Engineering College ...",Indian Restaurant,Bakery,Multicuisine Indian Restaurant,Fast Food Restaurant,Chinese Restaurant,Gourmet Shop,Coffee Shop,Café,Asian Restaurant,Hotel
1,Agaram,Multiplex,Indie Movie Theater,Clothing Store,Gym,Flower Shop,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store
2,Aminjikarai,Fast Food Restaurant,Pizza Place,Shopping Mall,American Restaurant,Clothing Store,Multiplex,Electronics Store,Café,Furniture / Home Store,Bookstore
3,"Anna Nagar (Chennai),Velacheri",Business Service,Yoga Studio,Concert Hall,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store,Falafel Restaurant,Fast Food Restaurant
4,"Anna Nagar East,Anna Nagar Western Extn,High C...",Vegetarian / Vegan Restaurant,Hotel,Metro Station,Food Court,Fast Food Restaurant,Bus Station,Cricket Ground,Daycare,Department Store,Dessert Shop


Since some venues don't have entries in Foursquare API we drop those elements before proceeding further

In [25]:
neighborhoods_venues_sorted=neighborhoods_venues_sorted.dropna()


## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [26]:
# set number of clusters
kclusters = 5

location_grouped_clustering = location_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(location_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 3, 0, 0, 0, 1, 0, 0])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [27]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

location_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
location_merged = location_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='place name')

location_merged.head() # check the last columns!

Unnamed: 0,latitude,longitude,country code,postal code,place name,admin name1,admin name2,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,12.9194,80.1697,IN,600078,Kalaignar Karunanidhi Nagar,Tamil Nadu,Chennai,0.0,Indian Restaurant,Bakery,IT Services,Clothing Store,Café,Yoga Studio,Food Court,Daycare,Department Store,Dessert Shop
1,12.9675,80.2598,IN,600041,Valmiki Nagar,Tamil Nadu,Chennai,0.0,Fast Food Restaurant,Indian Restaurant,Ice Cream Shop,Yoga Studio,Food Court,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store
2,12.9855,80.2604,IN,600041,Tiruvanmiyur,Tamil Nadu,Chennai,2.0,Performing Arts Venue,Yoga Studio,Food Court,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store,Falafel Restaurant,Fast Food Restaurant
3,13.0156,80.2467,IN,600085,Kotturpuram,Tamil Nadu,Chennai,0.0,Chinese Restaurant,Indian Restaurant,Department Store,Market,Food Court,Cricket Ground,Daycare,Dessert Shop,Electronics Store,Falafel Restaurant
4,13.025,80.2575,IN,600028,"Raja Annamalaipuram,Ramakrishna Nagar (Chennai)",Tamil Nadu,Chennai,0.0,Pizza Place,Yoga Studio,Restaurant,Arcade,Bar,Camera Store,Department Store,Dessert Shop,Gym / Fitness Center,Ice Cream Shop


Some locations are not clustered due to lack of Foursquare API Data and hence we will drop all such rows and then convert the cluster labels into Int for Visualisation

In [28]:
location_merged=location_merged.dropna()
location_merged['Cluster Labels'] =location_merged['Cluster Labels'].astype(int)

Finally, let's visualize the resulting clusters

In [29]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(location_merged['latitude'], location_merged['longitude'], location_merged['place name'], location_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

Now, we will examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

#### Cluster 1

In [30]:
location_merged.loc[location_merged['Cluster Labels'] == 0, location_merged.columns[[1] + list(range(5, location_merged.shape[1]))]]

Unnamed: 0,longitude,admin name1,admin name2,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,80.1697,Tamil Nadu,Chennai,0,Indian Restaurant,Bakery,IT Services,Clothing Store,Café,Yoga Studio,Food Court,Daycare,Department Store,Dessert Shop
1,80.2598,Tamil Nadu,Chennai,0,Fast Food Restaurant,Indian Restaurant,Ice Cream Shop,Yoga Studio,Food Court,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store
3,80.2467,Tamil Nadu,Chennai,0,Chinese Restaurant,Indian Restaurant,Department Store,Market,Food Court,Cricket Ground,Daycare,Dessert Shop,Electronics Store,Falafel Restaurant
4,80.2575,Tamil Nadu,Chennai,0,Pizza Place,Yoga Studio,Restaurant,Arcade,Bar,Camera Store,Department Store,Dessert Shop,Gym / Fitness Center,Ice Cream Shop
5,80.2406,Tamil Nadu,Chennai,0,Ice Cream Shop,Playground,Park,Sports Bar,Flower Shop,Convenience Store,Cricket Ground,Daycare,Department Store,Dessert Shop
6,80.2708,Tamil Nadu,Chennai,0,Indian Restaurant,Vegetarian / Vegan Restaurant,Clothing Store,Juice Bar,Food Court,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store
7,80.2301,Tamil Nadu,Chennai,0,Clothing Store,Shoe Store,Boutique,Miscellaneous Shop,Café,Snack Place,South Indian Restaurant,Jewelry Store,Pizza Place,Asian Restaurant
8,80.2258,Tamil Nadu,Chennai,0,Indian Restaurant,South Indian Restaurant,Road,Miscellaneous Shop,Snack Place,Train Station,Health & Beauty Service,Fast Food Restaurant,Convenience Store,Cricket Ground
9,80.2528,Tamil Nadu,Chennai,0,Indian Restaurant,Lounge,Italian Restaurant,Bowling Alley,Hotel,Women's Store,Health & Beauty Service,Gym / Fitness Center,Juice Bar,Kerala Restaurant
10,80.1913,Tamil Nadu,Chennai,0,Multiplex,Fast Food Restaurant,Burger Joint,Pizza Place,Indian Restaurant,Gym,Chinese Restaurant,Chettinad Restaurant,Café,Sandwich Place


#### Cluster 2

In [31]:
location_merged.loc[location_merged['Cluster Labels'] == 1, location_merged.columns[[1] + list(range(5, location_merged.shape[1]))]]

Unnamed: 0,longitude,admin name1,admin name2,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,80.2642,Tamil Nadu,Chennai,1,Indian Restaurant,Record Shop,Smoke Shop,Flower Shop,Convenience Store,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store
37,80.2842,Tamil Nadu,Chennai,1,Indian Restaurant,Yoga Studio,Frozen Yogurt Shop,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store,Falafel Restaurant,Fast Food Restaurant


#### Cluster 3

In [32]:
location_merged.loc[location_merged['Cluster Labels'] == 2, location_merged.columns[[1] + list(range(5, location_merged.shape[1]))]]

Unnamed: 0,longitude,admin name1,admin name2,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,80.2604,Tamil Nadu,Chennai,2,Performing Arts Venue,Yoga Studio,Food Court,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store,Falafel Restaurant,Fast Food Restaurant


#### Cluster 4

In [33]:
location_merged.loc[location_merged['Cluster Labels'] == 3, location_merged.columns[[1] + list(range(5, location_merged.shape[1]))]]

Unnamed: 0,longitude,admin name1,admin name2,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,79.936,Tamil Nadu,Chennai,3,Business Service,Yoga Studio,Concert Hall,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store,Falafel Restaurant,Fast Food Restaurant


#### Cluster 5

In [34]:
location_merged.loc[location_merged['Cluster Labels'] == 4, location_merged.columns[[1] + list(range(5, location_merged.shape[1]))]]

Unnamed: 0,longitude,admin name1,admin name2,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,80.2647,Tamil Nadu,Chennai,4,Market,Yoga Studio,Frozen Yogurt Shop,Cricket Ground,Daycare,Department Store,Dessert Shop,Electronics Store,Falafel Restaurant,Fast Food Restaurant
