# Chloropleth Map Analysis on the Zomato Restaurants in Chennai


A dataset of all zomato restaurants in Chennai as of 1st June 2020 is listed in a Dataset which is available in the [Link](https://www.kaggle.com/phiitm/chennai-zomato-restaurants-data?select=Zomato+Chennai+Listing+2020.csv)  
Using the data from this dataset which was obtained by scraping through Zomato, a few Chloropleth maps are going to be created to the zone-wise map of Chennai.  
Chennai has 15 zones according to the [Greater Chennai Corporation](https://chennaicorporation.gov.in/zone/index.htm)  
The Chloropleth maps that will be created will highlight  
1. The Zones of Chennai by number of Restaurants   
2. The Zones of Chennai by Price for 2 in Restaurants - Will highlight Zones with more expensive restaurants  
3. The Zones of Chennai by number of restaurants serving a given cuisine as specified by viewer  
4. Which Zones in Chennai have similar characteristics based on restaurants they contain?  
K-Means Clustering will be used to create cluters for the last question

This Analysis will determine the feasibilty and popularity of opening a restaurant of a particular cuisine in Chennai

## 1.The Zones of Chennai by number of Restaurants

Import the basic Libraries

In [1]:
import pandas as pd
import numpy as np

The next step is to import the dataset into a dataframe

In [2]:
file = 'Zomato Chennai Listing 2020.csv'
df = pd.read_csv(file)
df.head()

Unnamed: 0,Zomato URL,Name of Restaurant,Address,Location,Cuisine,Top Dishes,Price for 2,Dining Rating,Dining Rating Count,Delivery Rating,Delivery Rating Count,Features
0,https://www.zomato.com/chennai/yaa-mohaideen-b...,Yaa Mohaideen Briyani,"336 & 338, Main Road, Pallavaram, Chennai",Pallavaram,['Biryani'],"['Bread Halwa', ' Chicken 65', ' Mutton Biryan...",500.0,4.3,1500,4.3,9306,"['Home Delivery', 'Indoor Seating']"
1,https://www.zomato.com/chennai/sukkubhai-biriy...,Sukkubhai Biriyani,"New 14, Old 11/3Q, Railway Station Road, MKN ...",Alandur,"['Biryani', ' North Indian', ' Mughlai', ' Des...","['Beef Biryani', ' Beef Fry', ' Paratha', ' Pa...",1000.0,4.4,3059,4.1,39200,"['Home Delivery', 'Free Parking', 'Table booki..."
2,https://www.zomato.com/chennai/ss-hyderabad-bi...,SS Hyderabad Biryani,"98/339, Arcot Road, Opposite Gokulam Chit Fun...",Kodambakkam,"['Biryani', ' North Indian', ' Chinese', ' Ara...","['Brinjal Curry', ' Tandoori Chicken', ' Chick...",500.0,4.3,1361,4.4,10500,"['Home Delivery', 'Indoor Seating']"
3,https://www.zomato.com/chennai/kfc-perambur,KFC,"10, Periyar Nagar, 70 Feet Road, Near Sheeba ...",Perambur,"['Burger', ' Fast Food', ' Finger Food', ' Bev...",['Zinger Burger'],500.0,4.0,1101,4.0,11200,"['Home Delivery', 'Free Parking', 'Card Upon D..."
4,https://www.zomato.com/chennai/tasty-kitchen-p...,Tasty Kitchen,"135B, SRP Colony, Peravallur, Near Perambur, ...",Perambur,"['Chinese', ' Biryani', ' North Indian', ' Che...","['Mutton Biryani', ' Chicken Rice', ' Tomato R...",450.0,4.2,617,4.1,22400,"['Home Delivery', 'Indoor Seating']"


Let us check how many restaurants are there in the dataset

In [3]:
df.shape[0]

12032

12032 restaurants are listed in Zomato in Chennai!

The Dataset contains details about as shown above. All these columns aren't required for our visualization.  
We require the location but not the address. Similarly we do not require the Zomato URL, Dining and Delivery ratings and number of rating, Features and Top Dishes.
The columns not required will be dropped.

In [4]:
df.drop(["Zomato URL","Address","Dining Rating","Top Dishes","Dining Rating Count","Delivery Rating","Delivery Rating Count","Features"],axis=1,inplace=True)


In [5]:
df.head()

Unnamed: 0,Name of Restaurant,Location,Cuisine,Price for 2
0,Yaa Mohaideen Briyani,Pallavaram,['Biryani'],500.0
1,Sukkubhai Biriyani,Alandur,"['Biryani', ' North Indian', ' Mughlai', ' Des...",1000.0
2,SS Hyderabad Biryani,Kodambakkam,"['Biryani', ' North Indian', ' Chinese', ' Ara...",500.0
3,KFC,Perambur,"['Burger', ' Fast Food', ' Finger Food', ' Bev...",500.0
4,Tasty Kitchen,Perambur,"['Chinese', ' Biryani', ' North Indian', ' Che...",450.0


Now we require all the names of Neighbourhoods in Chennai and which zone they fall in.  
All Neighbourhoods in Chennai are referred to as "Wards" according to the GCC.  
The list of all wards and which zone they fall under can be obtained from this [Link](https://www.livechennai.com/List-of-Chennai-Corporation-Wards.asp)

Let us now obtain a dataframe of all wards and respective zones from the Link.

In [6]:
pd.set_option('display.max_rows', None)
URL1 = "https://www.livechennai.com/List-of-Chennai-Corporation-Wards.asp"
dfs = pd.read_html(URL1,header=0)
dfwards = dfs[0]
dfwards.dropna(axis=0,inplace=True)
dfwards.reset_index(inplace=True,drop=True)
dfwards.head()

Unnamed: 0,Ward Name,Zone Name
0,KATHIVAKKAM,THIRUVOTTRIYUR (Zone 1)
1,ENNORE,THIRUVOTTRIYUR (Zone 1)
2,ERNAVOOR,THIRUVOTTRIYUR (Zone 1)
3,AJAX,THIRUVOTTRIYUR (Zone 1)
4,TIRUVOTTRIYUR,THIRUVOTTRIYUR (Zone 1)


In [7]:
zonelist = list(dfwards["Zone Name"])
zonenamelist=[]
zonelist1=[]
for i in range(len(dfwards)):
    zonenamelist.append(dfwards["Zone Name"][i].split(" "))
for i in range(len(dfwards)):
    zonelist1.append(zonenamelist[i][0])
dfwards["Zone"] = zonelist1
zonenumberlist=[]
zonenumber=[]
for i in range(len(dfwards)):
    zonenumberlist.append(dfwards["Zone Name"][i].split("("))
for i in range(len(dfwards)):
    zonenumber.append(zonenumberlist[i][1][:-1])
dfwards["Zone Number"] = zonenumber
dfwards.drop("Zone Name",axis=1,inplace=True)
dfwards.rename(columns={"Zone":"Zone Name"},inplace=True)
dfwards.head()

Unnamed: 0,Ward Name,Zone Name,Zone Number
0,KATHIVAKKAM,THIRUVOTTRIYUR,Zone 1
1,ENNORE,THIRUVOTTRIYUR,Zone 1
2,ERNAVOOR,THIRUVOTTRIYUR,Zone 1
3,AJAX,THIRUVOTTRIYUR,Zone 1
4,TIRUVOTTRIYUR,THIRUVOTTRIYUR,Zone 1


Let us verify that we have 15 zones as specified by the GCC

In [8]:
dfwards["Zone Name"].nunique()

15

We have to add a new column to the Zomato dataframe specifying the zone the restaurant falls under.

In [9]:
pd.set_option('display.max_rows', 10)
Zones = []
for i in range(len(df)):
    flag=0
    for j in range(len(dfwards)):
        if dfwards["Ward Name"][j].upper() in df["Location"][i].upper():
            Zones.append(dfwards["Zone Number"][j])
            flag+=1
            break
        elif dfwards["Zone Name"][j].upper() in df["Location"][i].upper():
            Zones.append(dfwards["Zone Number"][j])
            flag+=1
            break
        else:
            continue
    if flag == 0:
        Zones.append(np.nan)
            
df["Zone"] = Zones
df

Unnamed: 0,Name of Restaurant,Location,Cuisine,Price for 2,Zone
0,Yaa Mohaideen Briyani,Pallavaram,['Biryani'],500.0,
1,Sukkubhai Biriyani,Alandur,"['Biryani', ' North Indian', ' Mughlai', ' Des...",1000.0,Zone 12
2,SS Hyderabad Biryani,Kodambakkam,"['Biryani', ' North Indian', ' Chinese', ' Ara...",500.0,Zone 9
3,KFC,Perambur,"['Burger', ' Fast Food', ' Finger Food', ' Bev...",500.0,Zone 4
4,Tasty Kitchen,Perambur,"['Chinese', ' Biryani', ' North Indian', ' Che...",450.0,Zone 4
...,...,...,...,...,...
12027,CK's Sandwiches,Porur,['Sandwich'],350.0,Zone 11
12028,CK's Sandwiches,Kolathur,['Sandwich'],350.0,Zone 6
12029,CK's Sandwiches,Anna Nagar East,['Sandwich'],350.0,Zone 8
12030,CK's Sandwiches,Ramapuram,['Sandwich'],350.0,Zone 11


You may Notice the First value of Zone is NaN.   
This is because Pallavaram falls outside Greater Chennai Corporation. It is infact in Chengalpattu District.  
A huge number of restaurants in the list do not fall under Chennai City limits but fall under Chennai Metropolitan Area. These Restaurants are included in the Zomato Dataset as they classify Chennai based on the Metropolitan Area and not Chennai District.  
We will plot a Chloropleth map of only the restaurants inside Chennai City limits.
Let us see how many restaurants are excluded as a result of this.

In [10]:
sum(df["Zone"].isna())

4427

4427 restaurants out of the 12302 restaurants fall outside the city limits!  
Let us see how many restaurants zomato has listed inside the city limits

In [11]:
len(df)-sum(df["Zone"].isna())

7605

Chennai Corporation encompasses an Area of 426 sq.km. and there are 7605 restaurants.  
Which means there are roughly 18 restaurants every sq.km on average within the city limits that Zomato has in their Listing.

After created a checkpoint dataframe, we will remove the Restaurants from the table which are not within city limits.  


In [12]:
df1 = df.copy()

In [13]:
df1.dropna(subset=["Zone"],inplace=True,axis=0)
df1.reset_index(inplace=True,drop=True)
df1

Unnamed: 0,Name of Restaurant,Location,Cuisine,Price for 2,Zone
0,Sukkubhai Biriyani,Alandur,"['Biryani', ' North Indian', ' Mughlai', ' Des...",1000.0,Zone 12
1,SS Hyderabad Biryani,Kodambakkam,"['Biryani', ' North Indian', ' Chinese', ' Ara...",500.0,Zone 9
2,KFC,Perambur,"['Burger', ' Fast Food', ' Finger Food', ' Bev...",500.0,Zone 4
3,Tasty Kitchen,Perambur,"['Chinese', ' Biryani', ' North Indian', ' Che...",450.0,Zone 4
4,Cafe Arabica,Anna Nagar East,"['Cafe', ' Fast Food']",800.0,Zone 8
...,...,...,...,...,...
7600,CK's Sandwiches,Porur,['Sandwich'],350.0,Zone 11
7601,CK's Sandwiches,Kolathur,['Sandwich'],350.0,Zone 6
7602,CK's Sandwiches,Anna Nagar East,['Sandwich'],350.0,Zone 8
7603,CK's Sandwiches,Ramapuram,['Sandwich'],350.0,Zone 11


We have all the data required to proceed to the next step.  
The next step is preparing the data to be compatible with the GeoJSON file used for the Chloropleth Map.

In [14]:
pd.set_option('display.max_rows', None)
df_values = pd.DataFrame(df1["Zone"].value_counts())
df_values.reset_index(inplace=True)
df_values.rename(columns={"index":"Zone No","Zone":"Count"},inplace=True)
df_values

Unnamed: 0,Zone No,Count
0,Zone 13,1086
1,Zone 9,1040
2,Zone 5,898
3,Zone 11,897
4,Zone 14,704
5,Zone 8,676
6,Zone 10,574
7,Zone 15,390
8,Zone 6,376
9,Zone 7,365


According to the GeoJSON file the Zone numbers are in a jumbled order.  
We will have to arrange our table according to the GeoJSON file.
In the GeoJSON file, we have an added zone for St.Thomas Mount. This Locality is a hill in Chennai and hence doesn't have a Zone number and is represented by a Zone Name of "St.Thomas Mount".
Moreover Zone 1 and Zone 2 are absent from the above dataframe as they have no restaurants under Zomato. Those rows will be added for the sake of thr GeoJSON file.


In [15]:
new_rows = [{'Zone No':"St. Thomas Mount", "Count":0},{'Zone No':"Zone 1", "Count":0},{'Zone No':"Zone 2", "Count":0}]
for i in range(len(new_rows)):
    df_values = df_values.append(new_rows[i], ignore_index=True)
df_values


Unnamed: 0,Zone No,Count
0,Zone 13,1086
1,Zone 9,1040
2,Zone 5,898
3,Zone 11,897
4,Zone 14,704
5,Zone 8,676
6,Zone 10,574
7,Zone 15,390
8,Zone 6,376
9,Zone 7,365


In [16]:
df_values = pd.DataFrame(df_values,index=[13,14,15,12,10,1,2,8,9,5,6,3,11,0,4,7])
df_values.reset_index(inplace=True)

In [17]:
df_values.drop("index",inplace=True,axis=1)
df_values

Unnamed: 0,Zone No,Count
0,St. Thomas Mount,0
1,Zone 1,0
2,Zone 2,0
3,Zone 3,61
4,Zone 4,277
5,Zone 9,1040
6,Zone 5,898
7,Zone 6,376
8,Zone 7,365
9,Zone 8,676


Our Data is ready for the first Chloropleth Map

In order to Visualize maps, we should install Folium

In [18]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Folium installed and imported!')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Folium installed and imported!


Now let's import the GeoJson File!

And then plot the map!

In [19]:
chennai_geo = r'https://raw.githubusercontent.com/datameet/Municipal_Spatial_Data/master/Chennai/Zones.geojson'

latitude = 13.0000
longitude = 80.2707

chennai_map = folium.Map(location=[latitude, longitude], zoom_start=11)

chennai_map.choropleth(
    geo_data=chennai_geo,
    data=df_values,
    columns=['Zone No','Count'],
    key_on='feature.properties.Zone Name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Number of Restaurants by Zone'
)

chennai_map

The density of restaurants is seen in the Map. It is very clear that Central and West Chennai have higher restaurant density.

## 2.The Zones of Chennai by Price for 2 in Restaurants   
Will highlight Zones with more expensive restaurants

Let us first Modify our df_values dataframe to include the Average price for 2 as a column

In [20]:
sums=np.linspace(0,0,num=(len(df_values)))
df_values["sums_"] = sums
for i in range(len(df)):
    for j in range(len(df_values)):
        if df["Zone"][i] == df_values["Zone No"][j]:
            df_values["sums_"][j]+=df["Price for 2"][i]
avg=[]
for i in range(len(df_values)):
    if df_values["Count"][i]==0:
        avg.append(0)
    else:
        avg.append(np.round(df_values["sums_"][i]/df_values["Count"][i],2))
df_values["Average"]=avg


            

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [21]:
df_values

Unnamed: 0,Zone No,Count,sums_,Average
0,St. Thomas Mount,0,0.0,0.0
1,Zone 1,0,0.0,0.0
2,Zone 2,0,0.0,0.0
3,Zone 3,61,19850.0,325.41
4,Zone 4,277,89560.0,323.32
5,Zone 9,1040,558680.0,537.19
6,Zone 5,898,335050.0,373.11
7,Zone 6,376,136750.0,363.7
8,Zone 7,365,118340.0,324.22
9,Zone 8,676,258920.0,383.02


Let us plot the next chloropleth map showing the Average Price for 2 by zone in Chennai.  
This shows the spending pattern of people in different zones of Chennai.  

In [22]:
chennai_geo = r'https://raw.githubusercontent.com/datameet/Municipal_Spatial_Data/master/Chennai/Zones.geojson'

latitude = 13.0000
longitude = 80.2707

chennai_map = folium.Map(location=[latitude, longitude], zoom_start=11)

chennai_map.choropleth(
    geo_data=chennai_geo,
    data=df_values,
    columns=['Zone No','Average'],
    key_on='feature.properties.Zone Name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Average Price for 2 in Zomato Restaurants by Zone in Chennai'
)
chennai_map

We can observe a pattern in the Map.  
The more close to the centre of the city, the price in restaurants in Higher.  
The Average price for 2 in a restaurant decreases concentrically from the center of the City.
With the exception of Shollinganallur in the south which has higher prices than expected possibly as a result of the Highly paid professionals living in the zone as it is the IT hub.

## The Zones of Chennai by number of restaurants serving a given cuisine as specified by viewer

Let us Move onto the Next map!  
Here a map will be plotted according to the cuisine specified by the user.  
The map will show which zone contains more restaurants serving the particular cuisine.

In [23]:
for i in range(len(df1)):
    df1["Cuisine"][i] = df["Cuisine"][i].replace("[","").replace("]","").replace("'","").split(",")



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [24]:
for i in range(len(df1)):
    for j in range(len(df1["Cuisine"][i])):
        df1["Cuisine"][i][j] = df1["Cuisine"][i][j].strip()
df1.head()

Unnamed: 0,Name of Restaurant,Location,Cuisine,Price for 2,Zone
0,Sukkubhai Biriyani,Alandur,[Biryani],1000.0,Zone 12
1,SS Hyderabad Biryani,Kodambakkam,"[Biryani, North Indian, Mughlai, Desserts, Bev...",500.0,Zone 9
2,KFC,Perambur,"[Biryani, North Indian, Chinese, Arabian]",500.0,Zone 4
3,Tasty Kitchen,Perambur,"[Burger, Fast Food, Finger Food, Beverages]",450.0,Zone 4
4,Cafe Arabica,Anna Nagar East,"[Chinese, Biryani, North Indian, Chettinad, Ar...",800.0,Zone 8


Let us now Generate a list of all the cuisines available in Zomato restaurants in Chennai

In [25]:
cuisinelist=[]
for i in range(len(df1)):
    for j in df1["Cuisine"][i]:
        if j not in cuisinelist:
            cuisinelist.append(j)
cuisinelist
    

['Biryani',
 'North Indian',
 'Mughlai',
 'Desserts',
 'Beverages',
 'Chinese',
 'Arabian',
 'Burger',
 'Fast Food',
 'Finger Food',
 'Chettinad',
 'South Indian',
 'Cafe',
 'Lebanese',
 'Salad',
 'Seafood',
 'Italian',
 'Hyderabadi',
 'Kerala',
 'Continental',
 'Asian',
 'Andhra',
 'Street Food',
 'Kebab',
 'Mithai',
 'Pizza',
 'Malaysian',
 'American',
 'BBQ',
 'Rolls',
 'Bakery',
 'Tamil',
 'Middle Eastern',
 'Ice Cream',
 'Singaporean',
 'European',
 'Mexican',
 'Rajasthani',
 'Burmese',
 'Thai',
 'Vietnamese',
 'Indonesian',
 'Japanese',
 'Momos',
 'Healthy Food',
 'Juices',
 'Sushi',
 'Sandwich',
 'Mediterranean',
 'Konkan',
 'Mangalorean',
 'Gujarati',
 'Spanish',
 'Steak',
 'Maharashtrian',
 'Modern Indian',
 'Wraps',
 'Korean',
 'French',
 'Irish',
 'Bar Food',
 'Tea',
 'Tibetan',
 'Parsi',
 'Iranian',
 'Bengali',
 'Naga',
 'Sri Lankan',
 'Malwani',
 'Moroccan',
 'Egyptian',
 'Turkish',
 'Russian',
 'Portuguese',
 'British',
 'Nepalese',
 'Greek',
 'Coffee',
 'Roast Chicken',


Seems like a lot. How many are there?

In [26]:
len(cuisinelist)

92

92 cuisines!
That's what I call a foodie haven!

In [27]:
df_values.drop("sums_",axis=1,inplace=True)

Now let us write a function which upon taking the argument of a particular cuisine, will modify df_values such that it fills a column with the count of the number of restaurants serving that cuisine in each zone.
Let us make the function also plot the map.

In [28]:
def cuisinefill(x="Italian"):                                     #If no argument specified Italian is the default argument
    Cuisine=[]
    for j in range(len(df_values)):
        count = 0
        for i in range(len(df1)):
            if df1["Zone"][i] == df_values["Zone No"][j]:
                if x in df1["Cuisine"][i]:
                    count+=1
        Cuisine.append(count)
    df_values["UserCuisine"] = Cuisine
    chennai_geo = r'https://raw.githubusercontent.com/datameet/Municipal_Spatial_Data/master/Chennai/Zones.geojson'

    latitude = 13.0000
    longitude = 80.2707

    chennai_map = folium.Map(location=[latitude, longitude], zoom_start=11)

    chennai_map.choropleth(
        geo_data=chennai_geo,
        data=df_values,
        columns=['Zone No','UserCuisine'],
        key_on='feature.properties.Zone Name',
        fill_color='YlOrRd', 
        fill_opacity=0.7, 
        line_opacity=0.2,
        legend_name= str(x)+' Restaurants listed by Zomato in Chennai by Zone'
    )
    return chennai_map

The Function is defined. Let us now plot the map!  
Please specify the cuisine you wish to view in the cell below. If you do not specify a cuisine, the default cuisine is Italian.

In [29]:
Cuisine = "Italian"  #Specify the cuisine you wish to see in place of Italian and run the current and next cell

In [30]:
y = cuisinefill(Cuisine)
y

A chloropleth map of your favourite cuisine in my favourite city is generated! :)

## Which Zones in Chennai have similar characteristics based on restaurants they contain? 

Now it is time to create a map showing which relates all Zones by similarity.

Lets call this dataframe df2

In [33]:
df2 = pd.DataFrame()
df2['Zone'] = df_values['Zone No']
for i in cuisinelist:
    df2[i] = np.zeros(len(df2))
df2    

Unnamed: 0,Zone,Biryani,North Indian,Mughlai,Desserts,Beverages,Chinese,Arabian,Burger,Fast Food,...,Grill,Bubble Tea,Raw Meats,Paan,Mishti,Charcoal Chicken,Odia,Bihari,Goan,Belgian
0,St. Thomas Mount,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Zone 1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Zone 2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Zone 3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Zone 4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Zone 9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Zone 5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Zone 6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Zone 7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Zone 8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Lets fill all the columns with the number of restaurants in each cuisine in the particular zone

In [34]:
for i in range(len(df1)):
    for j in range(len(df2)):
        if df1['Zone'][i] == df2['Zone'][j]:
            for k in df1['Cuisine'][i]:
                df2[k][j]+=1
        else:
            continue
df2

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,Zone,Biryani,North Indian,Mughlai,Desserts,Beverages,Chinese,Arabian,Burger,Fast Food,...,Grill,Bubble Tea,Raw Meats,Paan,Mishti,Charcoal Chicken,Odia,Bihari,Goan,Belgian
0,St. Thomas Mount,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Zone 1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Zone 2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Zone 3,13.0,12.0,1.0,3.0,7.0,19.0,1.0,4.0,17.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Zone 4,51.0,76.0,8.0,17.0,46.0,110.0,10.0,11.0,77.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Zone 9,179.0,342.0,47.0,104.0,195.0,355.0,35.0,27.0,244.0,...,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
6,Zone 5,133.0,306.0,27.0,59.0,138.0,330.0,30.0,18.0,187.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
7,Zone 6,62.0,116.0,12.0,33.0,74.0,134.0,13.0,12.0,101.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
8,Zone 7,63.0,118.0,12.0,28.0,76.0,124.0,9.0,4.0,97.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Zone 8,108.0,206.0,23.0,51.0,127.0,238.0,30.0,17.0,169.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


Now lets perform k-means clustering to find out which zones have a similar restaurant scene

In [36]:
from sklearn.cluster import KMeans

kclusters = 5

df2_clustering = df2.drop('Zone', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df2_clustering)

kmeans.labels_

array([2, 2, 2, 2, 0, 4, 1, 0, 0, 3, 3, 1, 0, 4, 3, 0])

In [45]:
cluster_df = pd.DataFrame()
cluster_df['Zone'] = df2['Zone']
cluster_df['Cluster'] = kmeans.labels_
cluster_df

Unnamed: 0,Zone,Cluster
0,St. Thomas Mount,2
1,Zone 1,2
2,Zone 2,2
3,Zone 3,2
4,Zone 4,0
5,Zone 9,4
6,Zone 5,1
7,Zone 6,0
8,Zone 7,0
9,Zone 8,3


Now lets create a map

In [48]:
chennai_geo = r'https://raw.githubusercontent.com/datameet/Municipal_Spatial_Data/master/Chennai/Zones.geojson'

latitude = 13.0000
longitude = 80.2707

chennai_map = folium.Map(location=[latitude, longitude], zoom_start=11)

chennai_map.choropleth(
    geo_data=chennai_geo,
    data=cluster_df,
    columns=['Zone','Cluster'],
    key_on='feature.properties.Zone Name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Zones with similar Restaurant and Cuisine popularity'
)

chennai_map

From the above map, it is clear which neighbourhoods have a similar pattern of restaurants and where it is better to open new restaurants given the existence and popularity of a cuisine in one.

This is the end of the report. I would really appreciate some constructive criticism on my approach and feel free to contact me at ajay.rangan@gmail.com for any discussion.