# IBM Applied Data Science Capstone Course by Coursera
### Week 5 Final Report
**_Opening a New Coffee shop in Chennai_**
- Build a dataframe of neighborhoods in chennai by web scraping the data from Wikipedia page
- Get the geographical coordinates of the neighborhoods
- Obtain the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open a new Coffee shop
***
### 1. Import libraries

In [12]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


### 2. Scrap data from Wikipedia page into a DataFrame

In [15]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Suburbs_of_Chennai").text

In [16]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [17]:
# create a list to store neighborhood data
neighborhoodList = []

In [18]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [26]:
# create a new DataFrame from the list
chennai_suburbs = pd.DataFrame({"Neighborhood": neighborhoodList})

chennai_suburbs.head()

Unnamed: 0,Neighborhood
0,Alandur
1,Anna Nagar
2,"Ashok Nagar, Chennai"
3,Assisi Nagar
4,Ayanavaram


In [27]:
# print the number of rows of the dataframe
chennai_suburbs.shape

(65, 1)

### 3. Get the geographical coordinates

In [24]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Chennai, Tamil Nadu'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [38]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in chennai_suburbs["Neighborhood"].tolist() ]

In [39]:
coords

[[13.00013000000007, 80.20060000000007],
 [13.083590000000072, 80.21020000000004],
 [13.035390000000064, 80.21220000000005],
 [13.164570000000026, 80.23274000000004],
 [13.09883000000002, 80.23238000000003],
 [12.932770000000062, 80.14387000000005],
 [12.95234000000005, 80.14411000000007],
 [12.988610000000051, 80.15100000000007],
 [12.82725000000005, 80.22866000000005],
 [12.837900000000047, 80.05327000000005],
 [13.040920000000028, 80.13649000000004],
 [13.11035000000004, 80.21301000000005],
 [13.129720000000077, 80.18300000000005],
 [13.120580000000075, 80.06047000000007],
 [12.956150000000036, 80.17885000000007],
 [12.79639000000003, 80.22294000000005],
 [13.081980000000044, 80.24448000000007],
 [13.051520000000039, 80.22421000000008],
 [13.136630000000025, 80.24479000000008],
 [13.09609000000006, 80.05288000000007],
 [13.116800000000069, 80.27726000000007],
 [13.183260000000075, 80.24059000000005],
 [12.905290000000036, 80.15352000000007],
 [13.157520000000034, 80.24283000000008],

In [40]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [41]:
# merge the coordinates into the original dataframe
chennai_suburbs['Latitude'] = df_coords['Latitude']
chennai_suburbs['Longitude'] = df_coords['Longitude']

In [42]:
# check the neighborhoods and the coordinates
print(chennai_suburbs.shape)
chennai_suburbs

(65, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Alandur,13.00013,80.2006
1,Anna Nagar,13.08359,80.2102
2,"Ashok Nagar, Chennai",13.03539,80.2122
3,Assisi Nagar,13.16457,80.23274
4,Ayanavaram,13.09883,80.23238
5,Chitlapakkam,12.93277,80.14387
6,Chromepet,12.95234,80.14411
7,Cowl Bazaar,12.98861,80.151
8,Egattur (Kanchipuram District),12.82725,80.22866
9,Guduvancheri,12.8379,80.05327


In [43]:
# save the DataFrame as CSV file
chennai_suburbs.to_csv("chennai_suburbs.csv", index=False)

### 4. Create a map of Chennai with neighborhoods superimposed on top

In [49]:
# create map of chennai  using latitude and longitude values

latitude,longitude = geocoder.arcgis('Chennai, Tamil Nadu').latlng 
map_chennai = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(chennai_suburbs['Latitude'], chennai_suburbs['Longitude'], chennai_suburbs['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_chennai)  
    
map_chennai

In [77]:
# save the map as HTML file
map_chennai.save('map_chennai.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [50]:
CLIENT_ID = 'FAJKGRXDXTWC5LBGA5BY1H0KS5XQ124T0M3FOJ5KDO1VUS2R' # your Foursquare ID
CLIENT_SECRET = '1KOMA2RH5TFVJRUVDSK00GUCZJ3KW5UVHXFMD0DBPEDCCOMP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FAJKGRXDXTWC5LBGA5BY1H0KS5XQ124T0M3FOJ5KDO1VUS2R
CLIENT_SECRET:1KOMA2RH5TFVJRUVDSK00GUCZJ3KW5UVHXFMD0DBPEDCCOMP


#### Defining the radius and URL for the foursquare API

In [54]:


venues = []

for lat, long, neighborhood in zip(chennai_suburbs['Latitude'], chennai_suburbs['Longitude'], chennai_suburbs['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [55]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(271, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Alandur,13.00013,80.2006,Sukkkubai Beef Biryani Shop,12.998769,80.201381,Indian Restaurant
1,Alandur,13.00013,80.2006,The Grand Sweets & Snacks,13.001746,80.198967,Indian Restaurant
2,Alandur,13.00013,80.2006,Asif Brothers Restaurant,13.001519,80.199085,Indian Restaurant
3,Alandur,13.00013,80.2006,Pizza Hut,13.00158,80.198461,Pizza Place
4,Alandur,13.00013,80.2006,Apoorva Restaurant,13.001583,80.198952,Breakfast Spot


**Let's check how many venues were returned for each neighorhood**

In [56]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alandur,8,8,8,8,8,8
Anna Nagar,32,32,32,32,32,32
"Ashok Nagar, Chennai",19,19,19,19,19,19
Ayanavaram,5,5,5,5,5,5
Chitlapakkam,3,3,3,3,3,3
Chromepet,13,13,13,13,13,13
Cowl Bazaar,1,1,1,1,1,1
Egattur (Kanchipuram District),5,5,5,5,5,5
Guduvancheri,4,4,4,4,4,4
Iyyapanthangal,3,3,3,3,3,3


**Let's find out how many unique categories can be curated from all the returned venues**

In [57]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 83 uniques categories.


#### group the Restaurants into food joints

In [68]:

foodjnts = []
for venue in venues_df['VenueCategory'].unique():
    if venue.find("Restaurant") >= 0:
        foodjnts.append(venue)
foodjnts    
    


['Indian Restaurant',
 'Chinese Restaurant',
 'Fast Food Restaurant',
 'South Indian Restaurant',
 'American Restaurant',
 'Vegetarian / Vegan Restaurant',
 'Restaurant',
 'Asian Restaurant',
 'Italian Restaurant',
 'Seafood Restaurant',
 'Hyderabadi Restaurant',
 'Indian Chinese Restaurant',
 'Kerala Restaurant',
 'Middle Eastern Restaurant']

In [74]:
venues_newdf = venues_df.copy()
venues_newdf["NewCategory"] = venues_newdf["VenueCategory"].apply(lambda x: "Food Joint"  if x.find("Restaurant") >= 0 else x)
venues_newdf["NewCategory"].unique()
len(venues_newdf["NewCategory"].unique())

70

In [58]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Indian Restaurant', 'Pizza Place', 'Breakfast Spot',
       'Train Station', 'Burger Joint', 'Metro Station', 'Sandwich Place',
       'Gym', 'Chinese Restaurant', 'Coffee Shop', 'Ice Cream Shop',
       'Snack Place', 'Shoe Store', 'Fast Food Restaurant', 'Park',
       'South Indian Restaurant', 'American Restaurant', 'Juice Bar',
       'Farmers Market', 'Vegetarian / Vegan Restaurant', 'BBQ Joint',
       'Clothing Store', 'Bakery', 'Department Store',
       'Electronics Store', 'Bus Station', 'Grocery Store',
       'Sculpture Garden', 'Diner', 'Café', 'Hotel', 'Tennis Court',
       'Restaurant', 'Food & Drink Shop', 'Pharmacy', 'Movie Theater',
       'Big Box Store', 'Shopping Mall', 'Light Rail Station', 'Food',
       'Asian Restaurant', 'Hotel Bar', 'Leather Goods Store',
       "Men's Store", 'Smoke Shop', 'Food Truck', 'Platform',
       'Photography Studio', 'Sporting Goods Shop', 'Auto Workshop'],
      dtype=object)

In [69]:
venues_df

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Alandur,13.00013,80.2006,Sukkkubai Beef Biryani Shop,12.998769,80.201381,Indian Restaurant
1,Alandur,13.00013,80.2006,The Grand Sweets & Snacks,13.001746,80.198967,Indian Restaurant
2,Alandur,13.00013,80.2006,Asif Brothers Restaurant,13.001519,80.199085,Indian Restaurant
3,Alandur,13.00013,80.2006,Pizza Hut,13.00158,80.198461,Pizza Place
4,Alandur,13.00013,80.2006,Apoorva Restaurant,13.001583,80.198952,Breakfast Spot
5,Alandur,13.00013,80.2006,St. Thomas mount railway station,12.998494,80.20374,Train Station
6,Alandur,13.00013,80.2006,Marrybrown,13.003648,80.200471,Burger Joint
7,Alandur,13.00013,80.2006,Alandur Metro Station,13.004158,80.201363,Metro Station
8,Anna Nagar,13.08359,80.2102,Subway,13.082455,80.210927,Sandwich Place
9,Anna Nagar,13.08359,80.2102,99°F Fitness Studio,13.084923,80.211343,Gym


### 6. Analyze Each Neighborhood

In [75]:
# one hot encoding
Cs_onehot = pd.get_dummies(venues_newdf[['NewCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Cs_onehot['Neighborhoods'] = venues_newdf['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Cs_onehot.columns[-1]] + list(Cs_onehot.columns[:-1])
Cs_onehot = Cs_onehot[fixed_columns]

print(Cs_onehot.shape)
Cs_onehot.head()

(271, 71)


Unnamed: 0,Neighborhoods,ATM,Airport Terminal,Auto Workshop,BBQ Joint,Badminton Court,Bakery,Bar,Big Box Store,Breakfast Spot,Burger Joint,Bus Station,Café,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Cricket Ground,Department Store,Diner,Electronics Store,Farmers Market,Food,Food & Drink Shop,Food Court,Food Joint,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Grocery Store,Gym,Gym / Fitness Center,Hotel,Hotel Bar,Ice Cream Shop,Juice Bar,Lake,Leather Goods Store,Light Rail Station,Market,Medical Supply Store,Men's Store,Metro Station,Motel,Movie Theater,Multiplex,Optical Shop,Park,Pet Store,Pharmacy,Photography Studio,Pizza Place,Platform,Playground,Resort,Sandwich Place,Sculpture Garden,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Spa,Sporting Goods Shop,Tennis Court,Train Station,Wine Shop,Women's Store,Yoga Studio
0,Alandur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Alandur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Alandur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Alandur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Alandur,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


**Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each Food Joints**

In [76]:
Cs_foodjoint = Cs_onehot.groupby('Neighborhoods').agg({'Food Joint': 'mean'}).reset_index()
print(Cs_foodjoint.shape)
Cs_foodjoint

(49, 2)


Unnamed: 0,Neighborhoods,Food Joint
0,Alandur,0.375
1,Anna Nagar,0.375
2,"Ashok Nagar, Chennai",0.315789
3,Ayanavaram,0.2
4,Chitlapakkam,0.333333
5,Chromepet,0.230769
6,Cowl Bazaar,1.0
7,Egattur (Kanchipuram District),0.2
8,Guduvancheri,0.5
9,Iyyapanthangal,0.333333


### Coffee shop in Chennai

In [77]:
Cs_CoffeeShop = Cs_onehot.groupby('Neighborhoods').agg({'Coffee Shop': 'mean'}).reset_index()
print(Cs_CoffeeShop.shape)
Cs_CoffeeShop

(49, 2)


Unnamed: 0,Neighborhoods,Coffee Shop
0,Alandur,0.0
1,Anna Nagar,0.03125
2,"Ashok Nagar, Chennai",0.0
3,Ayanavaram,0.0
4,Chitlapakkam,0.0
5,Chromepet,0.0
6,Cowl Bazaar,0.0
7,Egattur (Kanchipuram District),0.0
8,Guduvancheri,0.0
9,Iyyapanthangal,0.0


In [62]:
Cs_grouped = Cs_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(Cs_grouped.shape)
Cs_grouped

(49, 84)


Unnamed: 0,Neighborhoods,ATM,Airport Terminal,American Restaurant,Asian Restaurant,Auto Workshop,BBQ Joint,Badminton Court,Bakery,Bar,Big Box Store,Breakfast Spot,Burger Joint,Bus Station,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Cricket Ground,Department Store,Diner,Electronics Store,Farmers Market,Fast Food Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Grocery Store,Gym,Gym / Fitness Center,Hotel,Hotel Bar,Hyderabadi Restaurant,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Italian Restaurant,Juice Bar,Kerala Restaurant,Lake,Leather Goods Store,Light Rail Station,Market,Medical Supply Store,Men's Store,Metro Station,Middle Eastern Restaurant,Motel,Movie Theater,Multiplex,Optical Shop,Park,Pet Store,Pharmacy,Photography Studio,Pizza Place,Platform,Playground,Resort,Restaurant,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Spa,Sporting Goods Shop,Tennis Court,Train Station,Vegetarian / Vegan Restaurant,Wine Shop,Women's Store,Yoga Studio
0,Alandur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0
1,Anna Nagar,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.03125,0.03125,0.0,0.0625,0.0625,0.03125,0.0,0.0,0.0,0.0625,0.0,0.03125,0.03125,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0625,0.0,0.125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0
2,"Ashok Nagar, Chennai",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.052632,0.052632,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0
3,Ayanavaram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0
4,Chitlapakkam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Chromepet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.076923,0.0,0.0,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.230769,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0
6,Cowl Bazaar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Egattur (Kanchipuram District),0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Guduvancheri,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Iyyapanthangal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 7. Cluster Neighborhoods
Run k-means to cluster the neighborhoods in Chennai into 3 clusters.
We will use the foodjoints as we need to know how the foddjoints are distributed 

In [78]:
# set number of clusters
kclusters = 3

Cs_clustering = Cs_foodjoint.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Cs_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 1, 0, 0, 2, 1, 0, 0])

In [80]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
Cs_foodjointcluster = Cs_foodjoint.copy()

# add clustering labels
Cs_foodjointcluster["Cluster Labels"] = kmeans.labels_

In [81]:
Cs_foodjointcluster

Unnamed: 0,Neighborhoods,Food Joint,Cluster Labels
0,Alandur,0.375,0
1,Anna Nagar,0.375,0
2,"Ashok Nagar, Chennai",0.315789,0
3,Ayanavaram,0.2,1
4,Chitlapakkam,0.333333,0
5,Chromepet,0.230769,0
6,Cowl Bazaar,1.0,2
7,Egattur (Kanchipuram District),0.2,1
8,Guduvancheri,0.5,0
9,Iyyapanthangal,0.333333,0


In [82]:
Cs_foodjointcluster.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
Cs_foodjointcluster.head()

Unnamed: 0,Neighborhood,Food Joint,Cluster Labels
0,Alandur,0.375,0
1,Anna Nagar,0.375,0
2,"Ashok Nagar, Chennai",0.315789,0
3,Ayanavaram,0.2,1
4,Chitlapakkam,0.333333,0


In [84]:
# merge food Joint with chennai suburb data to add latitude/longitude for each neighborhood
Cs_foodjointcluster = Cs_foodjointcluster.join(chennai_suburbs.set_index("Neighborhood"), on="Neighborhood")

print(Cs_foodjointcluster.shape)
Cs_foodjointcluster.head() # check the last columns!

(49, 5)


Unnamed: 0,Neighborhood,Food Joint,Cluster Labels,Latitude,Longitude
0,Alandur,0.375,0,13.00013,80.2006
1,Anna Nagar,0.375,0,13.08359,80.2102
2,"Ashok Nagar, Chennai",0.315789,0,13.03539,80.2122
3,Ayanavaram,0.2,1,13.09883,80.23238
4,Chitlapakkam,0.333333,0,12.93277,80.14387


In [88]:
# sort the results by Cluster Labels
print(Cs_foodjointcluster.shape)
Cs_foodjointcluster.sort_values(["Cluster Labels"], inplace=True)
Cs_foodjointcluster

(49, 5)


Unnamed: 0,Neighborhood,Food Joint,Cluster Labels,Latitude,Longitude
0,Alandur,0.375,0,13.00013,80.2006
23,Nazarethpettai,0.5,0,13.0371,80.05755
4,Chitlapakkam,0.333333,0,12.93277,80.14387
9,Iyyapanthangal,0.333333,0,13.04092,80.13649
8,Guduvancheri,0.5,0,12.8379,80.05327
1,Anna Nagar,0.375,0,13.08359,80.2102
48,Washermanpet,0.25,0,13.1095,80.28701
14,Kilpauk,0.466667,0,13.08198,80.24448
19,Madipakkam,0.25,0,12.96448,80.2087
38,"Senji, Chennai",0.25,0,13.08362,80.28252


**Finally, let's visualize the resulting clusters**

In [89]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Cs_foodjointcluster['Latitude'], Cs_foodjointcluster['Longitude'], Cs_foodjointcluster['Neighborhood'], Cs_foodjointcluster['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [93]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### 8. Examine Clusters

#### Cluster 0

In [90]:
Cs_foodjointcluster.loc[Cs_foodjointcluster['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Food Joint,Cluster Labels,Latitude,Longitude
0,Alandur,0.375,0,13.00013,80.2006
23,Nazarethpettai,0.5,0,13.0371,80.05755
4,Chitlapakkam,0.333333,0,12.93277,80.14387
9,Iyyapanthangal,0.333333,0,13.04092,80.13649
8,Guduvancheri,0.5,0,12.8379,80.05327
1,Anna Nagar,0.375,0,13.08359,80.2102
48,Washermanpet,0.25,0,13.1095,80.28701
14,Kilpauk,0.466667,0,13.08198,80.24448
19,Madipakkam,0.25,0,12.96448,80.2087
38,"Senji, Chennai",0.25,0,13.08362,80.28252


#### Cluster 1

In [91]:
Cs_foodjointcluster.loc[Cs_foodjointcluster['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Food Joint,Cluster Labels,Latitude,Longitude
17,Korukkupet,0.0,1,13.1168,80.27726
24,Oragadam,0.0,1,13.13744,80.15383
27,Panambakkam,0.0,1,13.07761,80.15583
47,Virugambakkam,0.2,1,13.0559,80.19349
29,Peerkankaranai,0.0,1,12.91224,80.09895
18,Madambakkam,0.0,1,12.90529,80.15352
20,Maduravoyal,0.0,1,13.05841,80.16636
11,Kamarajapuram,0.0,1,13.12058,80.06047
40,Singaperumalkoil,0.166667,1,12.76333,80.0035
10,"K. K. Nagar, Chennai",0.0,1,13.11035,80.21301


#### Cluster 2

In [92]:
Cs_foodjointcluster.loc[Cs_foodjointcluster['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Food Joint,Cluster Labels,Latitude,Longitude
6,Cowl Bazaar,1.0,2,12.98861,80.151
28,Pattabiram,1.0,2,13.12333,80.05944
16,Kodungaiyur,1.0,2,13.13663,80.24479
22,Navalur,0.75,2,12.84584,80.22648


#### Observations:
As observations noted from the map in the Results section, most of the Food Joints are concentrated in the central area of Chennai city, with the highest number in cluster 2 and moderate number in cluster 0. On the other hand, cluster 1 has very low number to no Food Joints in the neighborhoods. This represents a great opportunity and high potential areas to open new Food Joints as there is moderate number and hence less competition. Meanwhile, Food Joints in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of Food Joints. From another perspective, the results also show that the oversupply of Food Joints mostly happened in the central area of the city, with the suburb area still have very few Food Joints. Therefore, this project recommends Investors to capitalize on these findings to open new Food Joints in neighborhoods in cluster 0. Investor with unique selling propositions to stand out from the competition can also open new Food Joints in neighborhoods in cluster 1 with sparing Food Joints. Lastly, investor are advised to avoid neighborhoods in cluster 2 which already have high concentration of Food Joints and suffering from intense competition. 

### End of Project