## Zomato Restaurants EDA & Cleaning

### Import Librariy

In [1]:
import pandas as pd

### Load Dataset

In [2]:
df = pd.read_csv("zomato.csv")

### Show first 5 rows

In [3]:
df.head()

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,approx_cost(for two people),reviews_list,menu_item,listed_in(type),listed_in(city)
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,Yes,Yes,4.1/5,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,Yes,No,4.1/5,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,Yes,No,3.8/5,918,+91 9663487993,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,No,No,3.7/5,88,+91 9620009302,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,No,No,3.8/5,166,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari


### Dataset Shape, Info & Null Check

In [4]:
print("Dataset shape:")
df.shape

Dataset shape:


(51717, 17)

In [5]:
print("Basic info:")
df.info()

Basic info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51717 entries, 0 to 51716
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   url                          51717 non-null  object
 1   address                      51717 non-null  object
 2   name                         51717 non-null  object
 3   online_order                 51717 non-null  object
 4   book_table                   51717 non-null  object
 5   rate                         43942 non-null  object
 6   votes                        51717 non-null  int64 
 7   phone                        50509 non-null  object
 8   location                     51696 non-null  object
 9   rest_type                    51490 non-null  object
 10  dish_liked                   23639 non-null  object
 11  cuisines                     51672 non-null  object
 12  approx_cost(for two people)  51371 non-null  object
 13  reviews_list       

In [6]:
print("Missing value Checking:")
df.isnull().sum()

Missing value Checking:


url                                0
address                            0
name                               0
online_order                       0
book_table                         0
rate                            7775
votes                              0
phone                           1208
location                          21
rest_type                        227
dish_liked                     28078
cuisines                          45
approx_cost(for two people)      346
reviews_list                       0
menu_item                          0
listed_in(type)                    0
listed_in(city)                    0
dtype: int64

### Drop Useless Columns

In [7]:
df.drop(['url', 'phone', 'address'], axis=1, inplace=True)

### Handle Nulls & Duplicates

In [8]:
df.drop_duplicates(inplace=True)

In [9]:
df.fillna({"rate": "0/5"}, inplace=True)
df.fillna({"location": "Unknown"}, inplace=True)
df.fillna({"rest_type": "Others"}, inplace=True)
df.fillna({"dish_liked": "Not Mentioned"}, inplace=True)

### Clean Ratings

In [10]:
# Convert "4.1/5" → 4.1
df['rate'] = df['rate'].apply(lambda x: float(str(x).split('/')[0]) if x != 'NEW' and x != '-' else 0)

### **EDA Questions to Answer**

1. Which location has the most restaurants?

In [11]:
df['location'].value_counts().head()

location
BTM                      5109
HSR                      2522
Koramangala 5th Block    2503
JP Nagar                 2234
Whitefield               2141
Name: count, dtype: int64

2. Which type of restaurant is most common?

In [12]:
df['rest_type'].value_counts().head()

rest_type
Quick Bites       19102
Casual Dining     10319
Cafe               3730
Delivery           2600
Dessert Parlor     2263
Name: count, dtype: int64

3. Average rating across restaurants

In [13]:
df['rate'].mean()

np.float64(2.9821736941959966)

4. Most liked dishes overall

In [14]:
df['dish_liked'].value_counts().head()

dish_liked
Not Mentioned      28027
Biryani              182
Chicken Biryani       73
Friendly Staff        69
Waffles               68
Name: count, dtype: int64

5. Which cuisines are most offered?

In [15]:
df['cuisines'].value_counts().head()

cuisines
North Indian             2907
North Indian, Chinese    2381
South Indian             1826
Biryani                   915
Bakery, Desserts          910
Name: count, dtype: int64

6. Cuisine-wise Average Rating

In [16]:
df.groupby('cuisines')['rate'].mean().sort_values(ascending=False).head(10)

cuisines
Healthy Food, Salad, Mediterranean                               4.900000
Continental, North Indian, Italian, South Indian, Finger Food    4.900000
Asian, Chinese, Thai, Momos                                      4.900000
North Indian, European, Mediterranean, BBQ                       4.800000
Asian, Mediterranean, North Indian, BBQ                          4.800000
European, Mediterranean, North Indian, BBQ                       4.789474
American, Tex-Mex, Burger, BBQ, Mexican                          4.750000
Sushi, Japanese, Chinese, Thai                                   4.700000
Italian, American, Pizza                                         4.700000
Chinese, American, Continental, Italian, North Indian            4.700000
Name: rate, dtype: float64

### Rate Distribution Check

In [17]:
df['rate'].describe()

count    51654.000000
mean         2.982174
std          1.516141
min          0.000000
25%          3.000000
50%          3.600000
75%          3.900000
max          4.900000
Name: rate, dtype: float64

### Save Cleaned Dataset

In [18]:
df.to_csv("zomato_cleaned.csv", index=False)