# 🍽️ Mumbai Restaurant Finder Project – A Foodie's Data-Driven Quest!

Hello there!👋 I'm Priyanka — a passionate foodie and an even more passionate data analyst in the making. This project is a delightful mix of both my worlds: food and data.

While most people scroll endlessly through food delivery apps or ask friends for restaurant recommendations, I decided to take the geekier route — building an **interactive dashboard to find restaurants in Mumbai that suit my taste**, using Power BI. But before we get to the juicy, visual goodness 🍕🍰🍷, it all starts here — with some good old data cleaning 🧹.

So let's see what's in the menu, i.e. the dataset I'm working with, which is the **Zomato Mumbai dataset**, originally sourced from Kaggle. It's about **four years old**, so while the data might have some vintage charm like a fine wine 🥂, you might want to cross-check if the restaurant still exists before you show up there craving sweets at midnight. 😅
This CSV file contains:
- 🏠 Restaurant name
- 📍 Address and the Locality in Mumbai
- 🍽️ Cuisines offered
- ⏰ Opening hours
- 💸 Price range
- ⭐ Ratings
- 🗳️ Votes
- 📝 Reviews
- 🔗 URL

But before diving into analysis or fancy visuals, I had to give this data a fresh look, thus doing some data cleaning.

First up: duplicates. I removed repeated entries because finding the same restaurant listed five times doesn’t make the food any tastier. Then came the missing values. Blank fields are like ordering pizzas🍕 and getting just the crust. I also encoded reviews that were in multiple languages (like turning “Velmi dobré” or “Muy Bueno” into “Very Good”), because good food speaks all tongues, but clean data only speaks one. Trimming and formatting the address column✂️ was also necessary because a long, messy address might make the user (mainly me😅) very confused. And lastly, I made sure the categorical columns were neat, uniform, and ready to serve.

As someone who lives for late-night cravings😋, Saturday party nights, impromptu brunches, and the occasional cake binge, I thought: why not build something that helps me (and others like me) discover places that match our vibe and appetite? This isn't just a project, it's my **taste buds meeting tech**.

This entire project is not just a technical exercise, it’s something I genuinely enjoyed❤️ doing, because it ties back to my personality and passion. It’s data that means something to me. Whether it’s a midnight craving or a brunch plan, this dashboard (once done) might just be my go-to way to discover new places in Mumbai, and I hope it helps others too!🙌
  
So, welcome to my foodie-fueled adventure. Let’s clean some data and eventually, find some seriously good eats.

Let’s dive in.

## Import Python Libraries

In [4]:
import pandas as pd
import numpy as np

## Import The Dataset
I imported the dataset, which was a CSV file, using pandas. In this particular dataset, the delimiter was not ',' but '|'.

These are all the small things that one gets to know after opening the file. You can also drop unwanted columns at the start in the brackets, and many other things. I wanted to take everything slowly and step by step.

In [5]:
df=pd.read_csv('Zomato_Mumbai_Dataset.csv', delimiter='|')

## Check The Data
Note all the columns available and what type of content they contain. This will help me in understanding how my data looks overall, just by seeing the top 5 rows using the head() function. The sample(5) function is used to pick 5 random rows from the dataset and is mainly used when the data is biased.

In [6]:
df.head()

Unnamed: 0,NAME,PRICE,CUSINE_CATEGORY,CITY,REGION,URL,PAGE NO,CUSINE TYPE,TIMING,RATING_TYPE,RATING,VOTES
0,Hitchki,1200,"Modern Indian,North Indian,Chinese,Momos,Birya...",Mumbai,First International Financial Centre-- Bandra ...,https://www.zomato.com/mumbai/hitchki-bandra-k...,1,Casual Dining,12noon to 130am(Mon-Sun),Excellent,4.9,3529
1,Baba Falooda,400,"Desserts,Ice Cream,Beverages",Mumbai,Mahim,https://www.zomato.com/mumbai/baba-falooda-mah...,1,Dessert Parlor,2pm to 1am(Mon-Sun),Very Good,4.4,1723
2,Chin Chin Chu,1800,"Asian,Chinese",Mumbai,Juhu,https://www.zomato.com/mumbai/chin-chin-chu-ju...,1,Casual Dining,12noon to 1am(Mon-Sun),Very Good,4.2,337
3,Butterfly High,1000,Modern Indian,Mumbai,Bandra Kurla Complex,https://www.zomato.com/mumbai/butterfly-high-b...,1,Bar,12noon to 130am(Mon-Sun),Very Good,4.3,1200
4,BKC DIVE,1200,"North Indian,Chinese,Continental",Mumbai,Bandra Kurla Complex,https://www.zomato.com/mumbai/bkc-dive-bandra-...,1,Bar,1130am to 1am(Mon-Sun),Veľmi dobré,4.4,5995


In [7]:
df.sample(5)

Unnamed: 0,NAME,PRICE,CUSINE_CATEGORY,CITY,REGION,URL,PAGE NO,CUSINE TYPE,TIMING,RATING_TYPE,RATING,VOTES
4290,Suyog Restaurant & Bar,750,"North Indian,Chinese,Mughlai,Seafood",Mumbai,Wadala,https://www.zomato.com/mumbai/suyog-restaurant...,340,Casual Dining,"1130am to 3pm,630pm to 12midnight(Mon-Sun)",Average,3.4,71
14764,Ebony Fine-Dine,1600,"Continental,Chinese,North Indian,Seafood",Mumbai,Kandivali East,https://www.zomato.com/ebony/info,93,Casual Dining,"1130am to 3pm,7pm to 1230AM(Mon-Sun)",Good,3.8,767
9350,Agraa Bazaar,300,"Fast Food,North Indian",Mumbai,Borivali West,https://www.zomato.com/mumbai/agraa-bazaar-bor...,625,Quick Bites,"830am to 3pm,4pm to 1030pm(Mon-Sun)",Average,3.4,12
14082,Bombay Foods,200,Sandwich,Mumbai,Near Andheri East Station,https://www.zomato.com/mumbai/bombay-foods-nea...,892,none,10am to 10pm(Mon-Sun),,NEW,NEW
8045,A Vanilla Bean,400,"Bakery,Desserts",Mumbai,Khar,https://www.zomato.com/avanillabean/info,551,Bakery,9am to 7pm(Mon-Sun),Very Good,4.0,120


## How big is the data
The shape() function tells us the number of rows and columns of the dataset. This will help me in understanding whether the data is too large to be handeled.

In [8]:
df.shape

(15081, 12)

## Types of Data
The info() function tells about the data type of each column in the dataset and also how many of them are not null. This will help me understand the overall type of data I am dealing with and also tell me whether I have to change the data type of a particular column.

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15081 entries, 0 to 15080
Data columns (total 12 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   NAME             15081 non-null  object
 1   PRICE            15080 non-null  object
 2   CUSINE_CATEGORY  15079 non-null  object
 3   CITY             15080 non-null  object
 4   REGION           15080 non-null  object
 5   URL              15080 non-null  object
 6   PAGE NO          15080 non-null  object
 7   CUSINE TYPE      15080 non-null  object
 8   TIMING           15015 non-null  object
 9   RATING_TYPE      14070 non-null  object
 10  RATING           15080 non-null  object
 11  VOTES            15080 non-null  object
dtypes: object(12)
memory usage: 1.4+ MB


## DataType Change
I can see that all the columns have the datatype as object, this will create problems in the transformation of data. Thus, I converted the price, rating and votes columns to numeric. Then to confirm whether the change was successful, we agian write the info() function.

In [10]:
#df['PRICE'] = df['PRICE'].astype(int)
#df['RATING'] = df['RATING'].astype(float)
#df['VOTES'] = df['VOTES'].astype(int)

df['PRICE']=pd.to_numeric(df['PRICE'], errors='coerce')
df['RATING']=pd.to_numeric(df['RATING'], errors='coerce')
df['VOTES']=pd.to_numeric(df['VOTES'], errors='coerce')

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15081 entries, 0 to 15080
Data columns (total 12 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   NAME             15081 non-null  object 
 1   PRICE            14138 non-null  float64
 2   CUSINE_CATEGORY  15079 non-null  object 
 3   CITY             15080 non-null  object 
 4   REGION           15080 non-null  object 
 5   URL              15080 non-null  object 
 6   PAGE NO          15080 non-null  object 
 7   CUSINE TYPE      15080 non-null  object 
 8   TIMING           15015 non-null  object 
 9   RATING_TYPE      14070 non-null  object 
 10  RATING           10768 non-null  float64
 11  VOTES            10768 non-null  float64
dtypes: float64(3), object(9)
memory usage: 1.4+ MB


## Remove Duplicates
There is no point in seeing the same restaurant on the dashboard more than once. Thus, first I will see how many duplicate rows there are in total and then the mean so that I can know what percentage of data is getting removed.

When removing the duplicates, I kept the first row as I don't want to waste any restaurant information.

In [12]:
df.duplicated().sum()

np.int64(941)

In [13]:
df.duplicated().mean()

np.float64(0.06239639281214773)

In [14]:
df=df.drop_duplicates(keep='first')

In [15]:
df.duplicated().sum()

np.int64(0)

## Rename Columns
For my better understanding of the data, I have renamed some columns. And also due to spelling mistakes.

And as a safe practice, I always recheck if the change has been made and updated.

In [16]:
df.rename({'RATING_TYPE':'REVIEW'}, axis=1, inplace=True)

In [17]:
df.columns

Index(['NAME', 'PRICE', 'CUSINE_CATEGORY', 'CITY', 'REGION', 'URL', 'PAGE NO',
       'CUSINE TYPE', 'TIMING', 'REVIEW', 'RATING', 'VOTES'],
      dtype='object')

In [18]:
df.rename({'CUSINE_CATEGORY' : 'CUISINE_CATEGORY', 'CUSINE TYPE' : 'RESTAURANT_TYPE'}, axis=1, inplace=True)

In [19]:
df.columns

Index(['NAME', 'PRICE', 'CUISINE_CATEGORY', 'CITY', 'REGION', 'URL', 'PAGE NO',
       'RESTAURANT_TYPE', 'TIMING', 'REVIEW', 'RATING', 'VOTES'],
      dtype='object')

## Checking The Review Column
There are many types of reviews a person can give. This dataset especially has reviews in Spanish; thus, I converted them into a normal review system, which is- Excellent, Very Good, Good, Average and Poor.
The unique() function gives the unique values in the column.

In [20]:
df['REVIEW'].unique()

array(['Excellent', 'Very Good', 'Veľmi dobré', 'RATING_TYPE', 'Good',
       'Velmi dobré', 'Not rated', nan, 'Average', 'Excelente',
       'Muito Bom', 'Poor', 'Skvělá volba', 'Çok iyi', 'Baik',
       'Bardzo dobrze', 'Bom', 'Média', 'Dobrze', 'Buono', 'İyi', 'Bueno',
       'Ortalama', 'Skvělé', 'Biasa', 'Průměr', 'Sangat Baik', 'Priemer',
       'Dobré', 'Promedio', 'Muy Bueno', 'Media'], dtype=object)

## Finding Suspicious Content
In the above answer, we find that there is a value called as 'RATING_TYPE'. After finding the same things in other columns, I figured that instead of NA, the dataset contained the column name. Thus, I removed all the null values and then used the shape function to see how many rows I am dealing with.

In [21]:
df['NAME'] = df['NAME'].drop(df[df['NAME'] == 'NAME'].index)

In [22]:
df.dropna(inplace = True)

In [23]:
df.shape

(10761, 12)

## Remove Unwanted Columns
No one wants garbage accumulating and taking up valuable space. Thus, I decided to keep what is useful and then delete the unwanted column. This will reduce the storage consumption and also make the operations smoother and faster.

I need to remove the 'PAGE NO' column as it is regarding the page number of the Zomato app, which is irrelevant to me and the analysis.

I need to remove the 'CITY' column as the whole column consists of the word 'MUMBAI', as this dataset is regarding restaurants in Mumbai.

In [24]:
df=df.drop(columns=['PAGE NO'])

In [25]:
df=df.drop(columns=['CITY'])

## The Review Column
Finally, I will start the data cleaning of the 'REVIW' column. I will first find out what the values are in the column, and as the language used for many reviews is Spanish, I will find out the meaning of each and every word and assign it an appropriate English name.

In [26]:
df['REVIEW'].unique()

array(['Excellent', 'Very Good', 'Veľmi dobré', 'Good', 'Velmi dobré',
       'Average', 'Excelente', 'Muito Bom', 'Poor', 'Skvělá volba',
       'Çok iyi', 'Baik', 'Bardzo dobrze', 'Bom', 'Média', 'Dobrze',
       'Buono', 'İyi', 'Bueno', 'Ortalama', 'Skvělé', 'Biasa', 'Průměr',
       'Sangat Baik', 'Priemer', 'Dobré', 'Promedio', 'Muy Bueno',
       'Media'], dtype=object)

In [27]:
df['REVIEW']=df['REVIEW'].replace('Excelente' , 'Excellent', regex=True)
df['REVIEW']=df['REVIEW'].replace('Veľmi dobré|Bardzo dobrze|Muy Bueno|Muito Bom|Muito Good|Sangat Baik|Velmi dobré' , 'Very Good', regex=True)
df['REVIEW']=df['REVIEW'].replace('Skvělá volba|Dobrze|Bueno|Buono|Dobré|Bom|Skvělé' , 'Good', regex=True)
df['REVIEW']=df['REVIEW'].replace('Priemer|Média|Media|Çok iyi|Biasa|Baik' , 'Average', regex=True)
df['REVIEW']=df['REVIEW'].replace('Průměr|Promedio|Ortalama|İyi|Media' , 'Poor', regex=True)

In [28]:
df['REVIEW'].unique()

array(['Excellent', 'Very Good', 'Good', 'Average', 'Poor'], dtype=object)

In [29]:
df['REVIEW'].value_counts()

REVIEW
Average      5116
Good         4344
Very Good    1150
Excellent      96
Poor           55
Name: count, dtype: int64

## The Region Column
First, let's see how many regions the dataset covers. And then I will make another column called 'PLACE' so as not to forget the detailed address. The place column will have a part of the region column string, which signifies the locality in Mumbai. This will help in the better analysis of the data.

In [30]:
df['REGION'].unique()

array(['First International Financial Centre-- Bandra Kurla Complex',
       'Mahim', 'Juhu', 'Bandra Kurla Complex', 'Flea Bazaar Café',
       'Marol', 'Oshiwara-- Andheri West', 'Kamala Mills Compound',
       'Dadar West', 'Khar', 'Lower Parel', 'Pali Hill-- Bandra West',
       'Mumbai CST Area', 'Bhandup', 'Malad West', 'Powai', 'Chembur',
       'Goregaon West', 'Andheri Lokhandwala-- Andheri West',
       'Reclamation-- Bandra West', 'Vile Parle East',
       'Palladium Mall-- Lower Parel', 'CBD-Belapur', 'Borivali West',
       'Vasai', 'Castle Mill-- Thane West', 'Parel',
       'Vasant Vihar-- Thane West', 'Colaba', 'Nariman Point',
       'Naupada-- Thane West', 'Goregaon East', 'Versova-- Andheri West',
       'Santacruz East', 'Mulund West', 'Kandivali East',
       'Panch Pakhadi-- Thane West', 'Mahakali',
       'Near Andheri East Station', 'Airoli', 'Hill Road-- Bandra West',
       'Mira Road', 'Fort', 'Ghodbunder Road', 'Jogeshwari', 'Vashi',
       'Ghatkopar East',

In [31]:
df['PLACE']= df['REGION'].str.replace('[a-zA-Z].+-- ','',regex=True)

In [32]:
df['PLACE'].unique()

array(['Bandra Kurla Complex', 'Mahim', 'Juhu', 'Flea Bazaar Café',
       'Marol', 'Andheri West', 'Kamala Mills Compound', 'Dadar West',
       'Khar', 'Lower Parel', 'Bandra West', 'Mumbai CST Area', 'Bhandup',
       'Malad West', 'Powai', 'Chembur', 'Goregaon West',
       'Vile Parle East', 'CBD-Belapur', 'Borivali West', 'Vasai',
       'Thane West', 'Parel', 'Colaba', 'Nariman Point', 'Goregaon East',
       'Santacruz East', 'Mulund West', 'Kandivali East', 'Mahakali',
       'Near Andheri East Station', 'Airoli', 'Mira Road', 'Fort',
       'Ghodbunder Road', 'Jogeshwari', 'Vashi', 'Ghatkopar East',
       'Bandra East', '7 Andheri West', 'Byculla', 'Kalyan', 'Bhayandar',
       'Malad East', 'Sakinaka', 'Kandivali West', 'Charni Road',
       'Borivali East', 'Chandivali', 'Mohammad Ali Road', 'Kharghar',
       'Matunga East', 'Worli', 'Dadar Shivaji Park', 'Azad Nagar',
       'Ulhasnagar', '4 Bungalows', 'Kalyan West', 'Kopar Khairane',
       'Dahisar East', 'Seawoods', 

## Contd.
As there are still a lot of abbreviations, I try to remove the common ones like east and west. Then comes the road inside the localities, which I have clubbed and put under the same locality name. There are also names with discrepancies with a '-' or one of the letters is in upper or lower case. This all will lead to many unique values. Thus, I have reduced the number of unique values as much as possible.

In [33]:
df['PLACE'] = df['PLACE'].str.replace(' West| west| East| east','',regex=True)

In [34]:
df['PLACE'].unique()

array(['Bandra Kurla Complex', 'Mahim', 'Juhu', 'Flea Bazaar Café',
       'Marol', 'Andheri', 'Kamala Mills Compound', 'Dadar', 'Khar',
       'Lower Parel', 'Bandra', 'Mumbai CST Area', 'Bhandup', 'Malad',
       'Powai', 'Chembur', 'Goregaon', 'Vile Parle', 'CBD-Belapur',
       'Borivali', 'Vasai', 'Thane', 'Parel', 'Colaba', 'Nariman Point',
       'Santacruz', 'Mulund', 'Kandivali', 'Mahakali',
       'Near Andheri Station', 'Airoli', 'Mira Road', 'Fort',
       'Ghodbunder Road', 'Jogeshwari', 'Vashi', 'Ghatkopar', '7 Andheri',
       'Byculla', 'Kalyan', 'Bhayandar', 'Sakinaka', 'Charni Road',
       'Chandivali', 'Mohammad Ali Road', 'Kharghar', 'Matunga', 'Worli',
       'Dadar Shivaji Park', 'Azad Nagar', 'Ulhasnagar', '4 Bungalows',
       'Kopar Khairane', 'Dahisar', 'Seawoods', 'Mumbai Central', 'Kurla',
       'Veera Desai Area', 'Chowpatty', 'Old Panvel', 'Sion', 'Tardeo',
       'Mazgaon', 'Prabhadevi', 'Sanpada', 'Ghansoli', 'Virar', 'Girgaum',
       'Mumbra', 'Marve

In [35]:
df['PLACE'] = df['PLACE'].str.replace('4 Bungalows|7 Andheri|Azad Nagar|Near Andheri Station|Veera Desai Area|Mahakali','Andheri',regex=True)
df['PLACE'] = df['PLACE'].str.replace('Bandra Kurla Complex','Bandra',regex=True)
df['PLACE'] = df['PLACE'].str.replace('CBD-Belapur','CBD Belapur',regex=True)
df['PLACE'] = df['PLACE'].str.replace('Girgaon Chowpatty','Chowpatty',regex=True)
df['PLACE'] = df['PLACE'].str.replace('Dadar Shivaji Park','Dadar',regex=True)
df['PLACE'] = df['PLACE'].str.replace('Flea Bazaar Café|Kamala Mills Compound','Lower Parel',regex=True)
df['PLACE'] = df['PLACE'].str.replace('Runwal Green','Mulund',regex=True)
df['PLACE'] = df['PLACE'].str.replace('Mumbai CST Area','Mumbai Central',regex=True)
df['PLACE'] = df['PLACE'].str.replace('Kopar Khairane|Ulwe','Navi Mumbai',regex=True)
df['PLACE'] = df['PLACE'].str.replace('New Panvel|Old Panvel','Panvel',regex=True)
df['PLACE'] = df['PLACE'].str.replace('Kamothe','Sion',regex=True)
df['PLACE'] = df['PLACE'].str.replace('Ghodbunder Road|Majiwada','Thane',regex=True)

In [36]:
df['PLACE'].unique()

array(['Bandra', 'Mahim', 'Juhu', 'Lower Parel', 'Marol', 'Andheri',
       'Dadar', 'Khar', 'Mumbai Central', 'Bhandup', 'Malad', 'Powai',
       'Chembur', 'Goregaon', 'Vile Parle', 'CBD Belapur', 'Borivali',
       'Vasai', 'Thane', 'Parel', 'Colaba', 'Nariman Point', 'Santacruz',
       'Mulund', 'Kandivali', 'Airoli', 'Mira Road', 'Fort', 'Jogeshwari',
       'Vashi', 'Ghatkopar', 'Byculla', 'Kalyan', 'Bhayandar', 'Sakinaka',
       'Charni Road', 'Chandivali', 'Mohammad Ali Road', 'Kharghar',
       'Matunga', 'Worli', 'Ulhasnagar', 'Navi Mumbai', 'Dahisar',
       'Seawoods', 'Kurla', 'Chowpatty', 'Panvel', 'Sion', 'Tardeo',
       'Mazgaon', 'Prabhadevi', 'Sanpada', 'Ghansoli', 'Virar', 'Girgaum',
       'Mumbra', 'Marve', 'Marine Lines', 'Mahalaxmi', 'Chakala',
       'Nalasopara', 'Kalwa', 'Nerul', 'Grant Road', 'Breach Candy',
       'Churchgate', 'Vikhroli', 'Kalbadevi', 'Dombivali', 'Kemps Corner',
       'Malabar Hill', 'Turbhe', 'Kalamboli', 'Wadala', 'Alibaug',
       '

## The Timing Column
It has a lot of values inside it, and thus, I decided to separate the timings from the days, as almost all the restaurants were open from Monday to Sunday. I also confirmed that there are no null values inside.

In [37]:
df['TIMING'] = df['TIMING'].str.split("(", n = 1, expand = True)[0]

In [38]:
df['TIMING'].head()

0    12noon to 130am
1         2pm to 1am
2      12noon to 1am
3    12noon to 130am
4      1130am to 1am
Name: TIMING, dtype: object

In [39]:
df['TIMING'].isnull().sum()

np.int64(0)

## The Cuisine Category Column
Even though the 'RESTAURANT_TYPE' column tells us the type of food they serve, this column has a detailed explanation of the cuisines provided. Thus, I used the function explode(), which made the strings into lists with different cuisines, which were separated by a comma.

In [41]:
df['CUISINE_CATEGORY'] = df['CUISINE_CATEGORY'].str.split(',')

In [42]:
df['CUISINE_CATEGORY'].explode().value_counts()

CUISINE_CATEGORY
Chinese         5231
North Indian    5170
Fast Food       2737
South Indian    1307
Mughlai         1157
                ... 
Pakistani          1
Raw Meats          1
Nepalese           1
Armenian           1
Oriya              1
Name: count, Length: 117, dtype: int64

In [43]:
df['CUISINE_CATEGORY'].explode().unique()

array(['Modern Indian', 'North Indian', 'Chinese', 'Momos', 'Biryani',
       'Continental', 'American', 'Fast Food', 'Desserts', 'Ice Cream',
       'Beverages', 'Asian', 'Street Food', 'Lucknowi', 'Mexican',
       'Italian', 'Mughlai', 'Thai', 'European', 'Seafood', 'Finger Food',
       'Burger', 'Salad', 'Healthy Food', 'Middle Eastern', 'Lebanese',
       'Kebab', 'Cafe', 'Sandwich', 'Iranian', 'Bakery', 'South Indian',
       'Maharashtrian', 'Pizza', 'Mediterranean', 'North Eastern',
       'Andhra', 'Mithai', 'Japanese', 'Sushi', 'Rolls', 'Parsi',
       'Juices', 'Malwani', 'French', 'Indian', 'Malaysian',
       'Roast Chicken', 'BBQ', 'Goan', 'Konkan', 'Mangalorean',
       'Chettinad', 'Sindhi', 'Burmese', 'Coffee', 'Korean', 'Gujarati',
       'Bengali', 'Kerala', 'Vietnamese', 'Singaporean', 'Indonesian',
       'Turkish', 'Arabian', 'British', 'Tea', 'Mishti', 'Tamil',
       'Bar Food', 'Awadhi', 'Rajasthani', 'Wraps', 'Charcoal Chicken',
       'Steak', 'Spanish', 'Ge

In [44]:
df['CUISINE_CATEGORY'].explode().value_counts().nlargest(5)

CUISINE_CATEGORY
Chinese         5231
North Indian    5170
Fast Food       2737
South Indian    1307
Mughlai         1157
Name: count, dtype: int64

In [45]:
df['RESTAURANT_TYPE'].unique()

array(['Casual Dining', 'Dessert Parlor', 'Bar', 'Café', 'Quick Bites',
       'Bakery', 'Sweet Shop', 'none', 'Food Court', 'Fine Dining',
       'Beverage Shop', 'Pub', 'Food Truck', 'Dhaba', 'Lounge', 'Kiosk',
       'Microbrewery', 'Paan Shop', 'Irani Cafe', 'Confectionery', 'Mess',
       'Bhojanalya'], dtype=object)

## Download The Data
As I have taken a lot of effort to modify the data according to my needs, I want to download it so that I can freely use it for my dashboard making.

I downloaded my modified dataset as a CSV file.

In [46]:
df.to_csv('Mumbai_Restaurants.csv', index=False)

In [176]:
from IPython.display import FileLink
FileLink('Mumbai_Restaurants.csv')