# **Hackathon Tasks**

**Task1 (Description)** - Apply Exploratory Data Analysis and answer the questions which were shared with you in google form.

**Task2 (Description)** - Create Cuisine-Specific Mapping using Folium Library.

**Task3 (Description)** - Interactive Density Mapping using Folium

## **TASK-1**

In [1]:
# Step 1: Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Step 2: Load the Datasets
zomato = pd.read_csv(r"C:\Users\RAMANA\Downloads\zomato_data.csv", encoding='latin-1')
geo = pd.read_csv(r"C:\Users\RAMANA\Downloads\Geographical Coordinates.csv")

# Preview the data
zomato.head()


Unnamed: 0,online_order,book_table,rate,votes,rest_type,dish_liked,cuisines,approx_costfor_two_people,listed_intype,listed_incity
0,Yes,Yes,4.1/5,775,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,Buffet,Banashankari
1,Yes,No,4.1/5,787,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,Buffet,Banashankari
2,Yes,No,3.8/5,918,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,Buffet,Banashankari
3,No,No,3.7/5,88,Quick Bites,Masala Dosa,"South Indian, North Indian",300,Buffet,Banashankari
4,No,No,3.8/5,166,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,Buffet,Banashankari


In [3]:
# Preview the data
zomato.head()                                                                                                             # Step 3: Basic Exploration
print("Zomato Shape:", zomato.shape)
print("Geo Shape:", geo.shape)

# Check missing values
print(zomato.isnull().sum())
print(geo.isnull().sum())      

Zomato Shape: (51717, 10)
Geo Shape: (26, 3)
online_order                     0
book_table                       0
rate                          7775
votes                            0
rest_type                      227
dish_liked                   28078
cuisines                        45
approx_costfor_two_people      346
listed_intype                    0
listed_incity                    0
dtype: int64
listed_incity    0
Latitude         0
Longitude        0
dtype: int64


## **Data Cleaning & Preprocessing**


**Step 1: Rating Column (rate)**

In [6]:
# Step 1: Replace '-' with NaN
zomato['rate'] = zomato['rate'].replace('-', np.nan)

# Step 2: Remove '/5' and keep only numeric part
zomato['rate'] = zomato['rate'].str.replace('/5', '', regex=False).str.strip()

# Step 3: Convert to float
zomato['rate'] = pd.to_numeric(zomato['rate'], errors='coerce')

# Step 4: Fill missing values with median rating
median_rating = zomato['rate'].median()
zomato['rate'].fillna(median_rating, inplace=True)

# ✅ Done: Check the result
print(zomato['rate'].describe())


count    51717.000000
mean         3.700362
std          0.395391
min          1.800000
25%          3.500000
50%          3.700000
75%          3.900000
max          4.900000
Name: rate, dtype: float64


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  zomato['rate'].fillna(median_rating, inplace=True)


**Step 2: Cost Column (approx_costfor_two_people)**

In [8]:
# Make sure column name is correct — normalize first if not done already
zomato.columns = zomato.columns.str.strip().str.lower().str.replace('(', '').str.replace(')', '').str.replace(' ', '_')

# Step 1: Remove commas
zomato['approx_costfor_two_people'] = zomato['approx_costfor_two_people'].str.replace(',', '', regex=False)

# Step 2: Convert to numeric
zomato['approx_costfor_two_people'] = pd.to_numeric(zomato['approx_costfor_two_people'], errors='coerce')

# Step 3: Fill missing values with median cost
median_cost = zomato['approx_costfor_two_people'].median()
zomato['approx_costfor_two_people'].fillna(median_cost, inplace=True)

# ✅ Done: Check results
print(zomato['approx_costfor_two_people'].describe())


count    51717.000000
mean       554.391689
std        437.563723
min         40.000000
25%        300.000000
50%        400.000000
75%        650.000000
max       6000.000000
Name: approx_costfor_two_people, dtype: float64


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  zomato['approx_costfor_two_people'].fillna(median_cost, inplace=True)


**Step 3: Categorical Columns**

In [10]:
# Step 1: Replace NaN in 'dish_liked' with "Not Available"
zomato['dish_liked'].fillna('Not Available', inplace=True)

# Step 2: Replace NaN in 'cuisines' with "Other"
zomato['cuisines'].fillna('Other', inplace=True)

# Step 3: Replace NaN in 'rest_type' with "Unknown"
zomato['rest_type'].fillna('Unknown', inplace=True)

# ✅ Done: Quick check
print(zomato[['dish_liked', 'cuisines', 'rest_type']].isna().sum())


dish_liked    0
cuisines      0
rest_type     0
dtype: int64


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  zomato['dish_liked'].fillna('Not Available', inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  zomato['cuisines'].fillna('Other', inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are 

In [12]:
print(zomato.columns)


Index(['online_order', 'book_table', 'rate', 'votes', 'rest_type',
       'dish_liked', 'cuisines', 'approx_costfor_two_people', 'listed_intype',
       'listed_incity'],
      dtype='object')


**Step 4: Votes Column**

In [20]:
# Step 1: Ensure 'votes' is numeric
zomato['votes'] = pd.to_numeric(zomato['votes'], errors='coerce')

# Step 2: Fill NaN with median
median_votes = zomato['votes'].median()
zomato['votes'].fillna(median_votes, inplace=True)

# ✅ Done: Verify
print(zomato['votes'].describe())


count    51717.000000
mean       283.697527
std        803.838853
min          0.000000
25%          7.000000
50%         41.000000
75%        198.000000
max      16832.000000
Name: votes, dtype: float64


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  zomato['votes'].fillna(median_votes, inplace=True)


**Step 5: Binary Encoding**

In [23]:
# Step 1: Map 'Yes' → 1, 'No' → 0 for both columns
zomato['online_order'] = zomato['online_order'].map({'Yes': 1, 'No': 0})
zomato['book_table'] = zomato['book_table'].map({'Yes': 1, 'No': 0})

# ✅ Done: Verify conversion
print(zomato[['online_order', 'book_table']].head())


   online_order  book_table
0             1           1
1             1           0
2             1           0
3             0           0
4             0           0


**Step 6: Data Type Conversion**

In [26]:
# Convert 'rate' to float (should already be, but reconfirm)
zomato['rate'] = zomato['rate'].astype(float)

# Convert 'votes' to integer
zomato['votes'] = zomato['votes'].astype(int)

# Convert 'approx_costfor_two_people' to integer
zomato['approx_costfor_two_people'] = zomato['approx_costfor_two_people'].astype(int)

# ✅ Done: Check data types
print(zomato.dtypes[['rate', 'votes', 'approx_costfor_two_people']])


rate                         float64
votes                          int32
approx_costfor_two_people      int32
dtype: object


In [28]:
print(zomato.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51717 entries, 0 to 51716
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   online_order               51717 non-null  int64  
 1   book_table                 51717 non-null  int64  
 2   rate                       51717 non-null  float64
 3   votes                      51717 non-null  int32  
 4   rest_type                  51717 non-null  object 
 5   dish_liked                 51717 non-null  object 
 6   cuisines                   51717 non-null  object 
 7   approx_costfor_two_people  51717 non-null  int32  
 8   listed_intype              51717 non-null  object 
 9   listed_incity              51717 non-null  object 
dtypes: float64(1), int32(2), int64(2), object(5)
memory usage: 3.6+ MB
None


In [30]:
print(zomato.isnull())

       online_order  book_table   rate  votes  rest_type  dish_liked  \
0             False       False  False  False      False       False   
1             False       False  False  False      False       False   
2             False       False  False  False      False       False   
3             False       False  False  False      False       False   
4             False       False  False  False      False       False   
...             ...         ...    ...    ...        ...         ...   
51712         False       False  False  False      False       False   
51713         False       False  False  False      False       False   
51714         False       False  False  False      False       False   
51715         False       False  False  False      False       False   
51716         False       False  False  False      False       False   

       cuisines  approx_costfor_two_people  listed_intype  listed_incity  
0         False                      False          False   

In [32]:
print(zomato.sum())

online_order                                                             30444
book_table                                                                6449
rate                                                                  191371.6
votes                                                                 14671985
rest_type                    Casual DiningCasual DiningCafe, Casual DiningQ...
dish_liked                   Pasta, Lunch Buffet, Masala Papad, Paneer Laja...
cuisines                     North Indian, Mughlai, ChineseChinese, North I...
approx_costfor_two_people                                             28671475
listed_intype                BuffetBuffetBuffetBuffetBuffetBuffetBuffetCafe...
listed_incity                BanashankariBanashankariBanashankariBanashanka...
dtype: object


In [33]:
print(zomato.describe())

       online_order    book_table          rate         votes  \
count  51717.000000  51717.000000  51717.000000  51717.000000   
mean       0.588665      0.124698      3.700362    283.697527   
std        0.492080      0.330379      0.395391    803.838853   
min        0.000000      0.000000      1.800000      0.000000   
25%        0.000000      0.000000      3.500000      7.000000   
50%        1.000000      0.000000      3.700000     41.000000   
75%        1.000000      0.000000      3.900000    198.000000   
max        1.000000      1.000000      4.900000  16832.000000   

       approx_costfor_two_people  
count               51717.000000  
mean                  554.391689  
std                   437.563723  
min                    40.000000  
25%                   300.000000  
50%                   400.000000  
75%                   650.000000  
max                  6000.000000  


In [34]:
zomato.isna().sum()

online_order                 0
book_table                   0
rate                         0
votes                        0
rest_type                    0
dish_liked                   0
cuisines                     0
approx_costfor_two_people    0
listed_intype                0
listed_incity                0
dtype: int64

In [38]:
zomato.to_csv('Zomato.csv',index=False)

In [40]:
import os
print(os.getcwd())

C:\Users\RAMANA\Downloads


In [56]:
Zomato = pd.read_csv(r"C:\Users\RAMANA\Downloads\Zomato.xls")

In [58]:
merged_df = pd.merge(Zomato, geo, on='listed_incity', how='left')


In [60]:
merged_df

Unnamed: 0,online_order,book_table,rate,votes,rest_type,dish_liked,cuisines,approx_costfor_two_people,listed_intype,listed_incity,Latitude,Longitude
0,1,1,4.1,775,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,Buffet,Banashankari,12.939333,77.553982
1,1,0,4.1,787,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,Buffet,Banashankari,12.939333,77.553982
2,1,0,3.8,918,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,Buffet,Banashankari,12.939333,77.553982
3,0,0,3.7,88,Quick Bites,Masala Dosa,"South Indian, North Indian",300,Buffet,Banashankari,12.939333,77.553982
4,0,0,3.8,166,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,Buffet,Banashankari,12.939333,77.553982
...,...,...,...,...,...,...,...,...,...,...,...,...
51712,0,0,3.6,27,Bar,Not Available,Continental,1500,Pubs and bars,Whitefield,,
51713,0,0,3.7,0,Bar,Not Available,Finger Food,600,Pubs and bars,Whitefield,,
51714,0,0,3.7,0,Bar,Not Available,Finger Food,2000,Pubs and bars,Whitefield,,
51715,0,1,4.3,236,Bar,"Cocktails, Pizza, Buttermilk",Finger Food,2500,Pubs and bars,Whitefield,,


In [62]:
geo

Unnamed: 0,listed_incity,Latitude,Longitude
0,Banashankari,12.939333,77.553982
1,Bannerghatta Road,12.95266,77.605048
2,Basavanagudi,12.941726,77.575502
3,Bellandur,12.925352,77.675941
4,Brigade Road,12.967358,77.606435
5,Brookefield,12.963814,77.722437
6,BTM,12.91636,77.604733
7,Church Street,12.974914,77.605247
8,Electronic City,12.84876,77.648253
9,Frazer Town,12.998683,77.615525


In [64]:
print(merged_df.columns)

Index(['online_order', 'book_table', 'rate', 'votes', 'rest_type',
       'dish_liked', 'cuisines', 'approx_costfor_two_people', 'listed_intype',
       'listed_incity', 'Latitude', 'Longitude'],
      dtype='object')


In [66]:
# Check the column names of the first dataset (Zomato Restaurant Data)
print("Columns in data1_cleaned (Zomato Data):", Zomato.columns)

# Check the column names of the second dataset (Geographical Coordinates)
print("Columns in data2 (Geographical Coordinates):", geo.columns)

# Now merge the datasets on the 'listed_incity' column
merged_df = pd.merge(Zomato, geo, on='listed_incity', how='left')

# Check the column names of the merged dataset
print("Columns in merged_df:", merged_df.columns)


Columns in data1_cleaned (Zomato Data): Index(['online_order', 'book_table', 'rate', 'votes', 'rest_type',
       'dish_liked', 'cuisines', 'approx_costfor_two_people', 'listed_intype',
       'listed_incity'],
      dtype='object')
Columns in data2 (Geographical Coordinates): Index(['listed_incity', 'Latitude', 'Longitude'], dtype='object')
Columns in merged_df: Index(['online_order', 'book_table', 'rate', 'votes', 'rest_type',
       'dish_liked', 'cuisines', 'approx_costfor_two_people', 'listed_intype',
       'listed_incity', 'Latitude', 'Longitude'],
      dtype='object')


In [89]:
import os

# Use a raw string to avoid issues with backslashes
downloads_path = r"C:\Users\RAMANA\Downloads"

# List all files in the folder to help identify the correct file name
print(os.listdir(downloads_path))


['.ipynb_checkpoints', '1.pdf', '1740368972723.jpg', '1_eOeHQTszOo9ixjrHCDjuBg.webp', '1_ZDXRAf_Ff4UoGa1fICzpjQ.webp', '20250306-0738-56.5359455.zip', '3604651-Daily_Task_(1) (1).pdf', '3604651-Daily_Task_(1).pdf', '3615303-DAILY_TASK-_Advance_Stats (1).pdf', '3801095-week_3_and_4.zip', '3884056-Files.zip', 'Advanced DS with Python Question Paper NASSCOM.docx', 'Advanced_Data_Analysis-Sharmila_Lakkimsetti (1).pdf', 'Advanced_Data_Analysis-Sharmila_Lakkimsetti.pdf', 'Advertising.csv', 'AdvGenAI-Sharmila_Lakkimsetti.pdf', 'Anaconda3-2024.06-1-Windows-x86_64.exe', 'angry.jpeg', 'app (1).py', 'app.py', 'appp.py.ipynb', 'appp3.py', 'archive (1).zip', 'archive (2).zip', 'archive (3).zip', 'archive (4)', 'archive (5).zip', 'archive (6).zip', 'archive (8).zip', 'archive.zip', 'assignment_slope_minima_maxima.pdf', 'background kpi.png', 'balanced_dataset.csv', 'BlinkIT Grocery Data.xlsx', 'cancer.ipynb', 'cancer_prediction_data.csv', 'cheat_sheet-python-final-medium.pdf', 'ChromeSetup.exe', 'cof

In [91]:
import os

downloads_path = r"C:\Users\RAMANA\Downloads"
files = os.listdir(downloads_path)

# Print all files to find the correct CSV name
for file in files:
    print(file)


.ipynb_checkpoints
1.pdf
1740368972723.jpg
1_eOeHQTszOo9ixjrHCDjuBg.webp
1_ZDXRAf_Ff4UoGa1fICzpjQ.webp
20250306-0738-56.5359455.zip
3604651-Daily_Task_(1) (1).pdf
3604651-Daily_Task_(1).pdf
3615303-DAILY_TASK-_Advance_Stats (1).pdf
3801095-week_3_and_4.zip
3884056-Files.zip
Advanced DS with Python Question Paper NASSCOM.docx
Advanced_Data_Analysis-Sharmila_Lakkimsetti (1).pdf
Advanced_Data_Analysis-Sharmila_Lakkimsetti.pdf
Advertising.csv
AdvGenAI-Sharmila_Lakkimsetti.pdf
Anaconda3-2024.06-1-Windows-x86_64.exe
angry.jpeg
app (1).py
app.py
appp.py.ipynb
appp3.py
archive (1).zip
archive (2).zip
archive (3).zip
archive (4)
archive (5).zip
archive (6).zip
archive (8).zip
archive.zip
assignment_slope_minima_maxima.pdf
background kpi.png
balanced_dataset.csv
BlinkIT Grocery Data.xlsx
cancer.ipynb
cancer_prediction_data.csv
cheat_sheet-python-final-medium.pdf
ChromeSetup.exe
cofeeeee.ipynb
coffee.csv
cry.jpg
DA Schedule saranya.xlsx
Data Preprocessing
DATA SCIENCE_CURRICULUM_2024.pdf
Data Vis

In [93]:
print(geo.columns)


Index(['listed_incity', 'Latitude', 'Longitude'], dtype='object')


In [95]:
geo[['Latitude', 'Longitude']]

Unnamed: 0,Latitude,Longitude
0,12.939333,77.553982
1,12.95266,77.605048
2,12.941726,77.575502
3,12.925352,77.675941
4,12.967358,77.606435
5,12.963814,77.722437
6,12.91636,77.604733
7,12.974914,77.605247
8,12.84876,77.648253
9,12.998683,77.615525


In [103]:
!pip install folium

Collecting folium
  Downloading folium-0.19.5-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting branca>=0.6.0 (from folium)
  Downloading branca-0.8.1-py3-none-any.whl.metadata (1.5 kB)
Downloading folium-0.19.5-py2.py3-none-any.whl (110 kB)
   ---------------------------------------- 0.0/110.9 kB ? eta -:--:--
   --- ------------------------------------ 10.2/110.9 kB ? eta -:--:--
   ----------- ---------------------------- 30.7/110.9 kB 1.3 MB/s eta 0:00:01
   ----------- ---------------------------- 30.7/110.9 kB 1.3 MB/s eta 0:00:01
   ----------- ---------------------------- 30.7/110.9 kB 1.3 MB/s eta 0:00:01
   ----------- ---------------------------- 30.7/110.9 kB 1.3 MB/s eta 0:00:01
   ----------- ---------------------------- 30.7/110.9 kB 1.3 MB/s eta 0:00:01
   ----------- ---------------------------- 30.7/110.9 kB 1.3 MB/s eta 0:00:01
   ----------- ---------------------------- 30.7/110.9 kB 1.3 MB/s eta 0:00:01
   -------------- ------------------------- 41.0/110.9 kB 93.4

In [105]:
import folium
from folium.plugins import HeatMap

# Drop rows with missing coordinates
map_data = merged_df.dropna(subset=['Latitude', 'Longitude'])

# Initialize folium map centered around Bangalore
bangalore_center = [12.9716, 77.5946]
restaurant_map = folium.Map(location=bangalore_center, zoom_start=11)

# Create list of lat-long pairs for HeatMap
heat_data = list(zip(map_data['Latitude'], map_data['Longitude']))

# Add HeatMap to the map
HeatMap(heat_data).add_to(restaurant_map)

# Display map
restaurant_map


## **Task-2**

## **Task 3**

In [109]:
# Filter restaurants that serve Italian cuisine (case-insensitive)
italian_df = merged_df[merged_df['cuisines'].str.contains('Italian', case=False, na=False)]

# Drop rows with missing coordinates
italian_df = italian_df.dropna(subset=['Latitude', 'Longitude'])

print(f"Total Italian restaurants found: {len(italian_df)}")


Total Italian restaurants found: 3046


In [111]:
# Initialize map centered at Bangalore
italian_map = folium.Map(location=[12.9716, 77.5946], zoom_start=11)

# Add markers for Italian restaurants
for index, row in italian_df.iterrows():
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=f"Cuisine: {row['cuisines']} | Rating: {row['rate']}",
        icon=folium.Icon(color='red', icon='cutlery', prefix='fa')
    ).add_to(italian_map)

# Show the map
italian_map

## **1. What is the shape of the given dataset?**

In [113]:
# Check the shape of the dataset
print(Zomato.shape)

(51717, 10)


## **2. How many restaurants serve North Indian cuisine?**

In [115]:
# Filter restaurants that serve North Indian cuisine
north_indian_restaurants = merged_df[merged_df['cuisines'].str.contains('North Indian', case=False, na=False)]

# Count the number of North Indian restaurants
north_indian_count = len(north_indian_restaurants)
print(f"Number of North Indian restaurants: {north_indian_count}")


Number of North Indian restaurants: 21085


## **3. What cuisine is most commonly offered by restaurants in Bangalore?**

In [117]:
# Split cuisines into individual cuisines (in case multiple cuisines are listed for each restaurant)
cuisines_split = merged_df['cuisines'].str.split(',').explode().str.strip()

# Count the occurrences of each cuisine
cuisine_counts = cuisines_split.value_counts()

# Display the most common cuisines
print("Most common cuisines offered by restaurants in Bangalore:")
print(cuisine_counts.head())


Most common cuisines offered by restaurants in Bangalore:
cuisines
North Indian    21085
Chinese         15547
South Indian     8644
Fast Food        8096
Biryani          6492
Name: count, dtype: int64


## **4. Which locality in Bangalore has the highest average cost for dining (for two people)?**

In [119]:
# Group by locality (listed_incity) and calculate the mean cost for two people
avg_cost_by_locality = merged_df.groupby('listed_incity')['approx_costfor_two_people'].mean()

# Sort the results to find the locality with the highest average cost
highest_avg_cost_locality = avg_cost_by_locality.idxmax()
highest_avg_cost = avg_cost_by_locality.max()

print(f"The locality with the highest average cost for dining is {highest_avg_cost_locality} with an average cost of {highest_avg_cost:.2f} INR.")


The locality with the highest average cost for dining is Church Street with an average cost of 770.36 INR.


## **5. Which restaurant type has the top rating with over 1000 votes?**

In [121]:
# Filter the dataset for restaurants with more than 1000 votes
high_votes_df = merged_df[merged_df['votes'] > 1000]

# Group by restaurant type and calculate the mean rating
avg_rating_by_rest_type = high_votes_df.groupby('rest_type')['rate'].mean()

# Find the restaurant type with the highest rating
top_rest_type = avg_rating_by_rest_type.idxmax()
top_rating = avg_rating_by_rest_type.max()

print(f"The restaurant type with the top rating and over 1000 votes is {top_rest_type} with a rating of {top_rating:.2f}.")


The restaurant type with the top rating and over 1000 votes is Bakery with a rating of 4.80.


## **6. How much does it cost at minimum to eat out in Bangalore?**

In [123]:
# Find the minimum cost for two people
min_cost = merged_df['approx_costfor_two_people'].min()

print(f"The minimum cost to eat out in Bangalore is ₹{min_cost}.")


The minimum cost to eat out in Bangalore is ₹40.


 ## **7. What percentage of total online orders is received by restaurants in Banashankari?**

In [128]:
# Filter for restaurants in Banashankari
banashankari_orders = merged_df[merged_df['listed_incity'] == 'Banashankari']

# Calculate total online orders for Banashankari
banashankari_online_orders = banashankari_orders['online_order'].sum()

# Calculate total online orders in the entire dataset
total_online_orders = merged_df['online_order'].sum()

# Calculate the percentage of total online orders in Banashankari
percentage_banashankari = (banashankari_online_orders / total_online_orders) * 100

print(f"The percentage of total online orders received by restaurants in Banashankari is {percentage_banashankari:.2f}%.")


The percentage of total online orders received by restaurants in Banashankari is 1.79%.


## **8. Which locality has the most restaurants with over 500 votes and a rating below 3.0?**

In [135]:
# Filter the dataset for restaurants with over 500 votes and a rating below 3.0
filtered_df = merged_df[(merged_df['votes'] > 500) & (merged_df['rate'] < 3.0)]

# Group by locality and count the number of restaurants
restaurant_count_by_locality = filtered_df['listed_incity'].value_counts()

# Get the locality with the most such restaurants
most_restaurants_locality = restaurant_count_by_locality.idxmax()
most_restaurants_count = restaurant_count_by_locality.max()

print(f"The locality with the most restaurants having over 500 votes and a rating below 3.0 is {most_restaurants_locality} with {most_restaurants_count} restaurants.")


The locality with the most restaurants having over 500 votes and a rating below 3.0 is Brookefield with 8 restaurants.


## **9. Which locality in Bangalore should Zomato target for expansion based on restaurant type diversity?**

In [138]:
# Count the number of unique restaurant types for each locality
restaurant_type_diversity = merged_df.groupby('listed_incity')['rest_type'].nunique()

# Find the locality with the highest number of unique restaurant types
target_locality = restaurant_type_diversity.idxmax()
max_diversity_count = restaurant_type_diversity.max()

print(f"The locality with the most restaurant type diversity is {target_locality} with {max_diversity_count} unique restaurant types.")


The locality with the most restaurant type diversity is BTM with 62 unique restaurant types.


## **10. What's the average cost difference between buffet and delivery restaurants?**

In [140]:
# Filter the dataset for buffet and delivery restaurants
buffet_df = merged_df[merged_df['listed_intype'] == 'Buffet']
delivery_df = merged_df[merged_df['listed_intype'] == 'Delivery']

# Calculate the average cost for two people for both buffet and delivery restaurants
avg_buffet_cost = buffet_df['approx_costfor_two_people'].mean()
avg_delivery_cost = delivery_df['approx_costfor_two_people'].mean()

# Calculate the difference
cost_difference = abs(avg_buffet_cost - avg_delivery_cost)

print(f"The average cost difference between buffet and delivery restaurants is ₹{cost_difference:.2f}.")


The average cost difference between buffet and delivery restaurants is ₹831.25.


## **11 . What is the maximum number of votes received by any restaurant with online ordering?**

In [144]:
# Filter the dataset for restaurants with online ordering
online_order_df = merged_df[merged_df['online_order'] == 1]

# Find the maximum number of votes received by any restaurant with online ordering
max_votes_online_order = online_order_df['votes'].max()

print(f"The maximum number of votes received by any restaurant with online ordering is {max_votes_online_order}.")


The maximum number of votes received by any restaurant with online ordering is 16832.


## **12. What is the average rating of restaurants that serve both North Indian and Chinese cuisines?**

In [147]:
# Filter the dataset for restaurants that serve both North Indian and Chinese cuisines
north_indian_chinese_df = merged_df[merged_df['cuisines'].str.contains('North Indian', case=False, na=False) & 
                                    merged_df['cuisines'].str.contains('Chinese', case=False, na=False)]

# Calculate the average rating for these restaurants
avg_rating_north_indian_chinese = north_indian_chinese_df['rate'].mean()

print(f"The average rating of restaurants that serve both North Indian and Chinese cuisines is {avg_rating_north_indian_chinese:.2f}.")


The average rating of restaurants that serve both North Indian and Chinese cuisines is 3.59.


## **13. What is the most profitable area for Zomato based on potential revenue estimation?**

In [150]:
# Filter for only the relevant areas
target_areas = ['Brookefield', 'Koramangala 8th Block', 'Koramangala 7th Block', 'Bellandur']
subset = merged_df[merged_df['listed_incity'].isin(target_areas)]

# Calculate potential revenue for each area
subset['revenue_estimate'] = subset['votes'] * subset['approx_costfor_two_people']
revenue_by_area = subset.groupby('listed_incity')['revenue_estimate'].sum().sort_values(ascending=False)

# Display the results
print(revenue_by_area)

listed_incity
Koramangala 7th Block    1006195610
Bellandur                 416443410
Brookefield               263125500
Name: revenue_estimate, dtype: int64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  subset['revenue_estimate'] = subset['votes'] * subset['approx_costfor_two_people']


## **14. If Zomato wants to reduce customer complaints, which restaurant type should they focus on?**

In [153]:
# Define the restaurant types in the options
restaurant_types = ['Bakery, Beverage Shop', 'Sweet Shop, Quick Bites', 'Quick Bites', 'Fine Dining']

# Filter the dataset for these restaurant types
filtered_rest_types = merged_df[merged_df['rest_type'].isin(restaurant_types)]

# Group by restaurant type and calculate the average rating for each type
avg_rating_filtered = filtered_rest_types.groupby('rest_type')['rate'].mean()

# Find the restaurant type with the lowest average rating
worst_rest_type = avg_rating_filtered.idxmin()
lowest_rating = avg_rating_filtered.min()

print(f"The restaurant type Zomato should focus on to reduce customer complaints is {worst_rest_type} with an average rating of {lowest_rating:.2f}.")


The restaurant type Zomato should focus on to reduce customer complaints is Quick Bites with an average rating of 3.59.


## **15. In which area should Zomato invest by considering high rating (rate > 4.2), high number of votes (> 500) and including online orders?**

In [156]:
# Filter the dataset for restaurants that meet all the conditions
high_rating_votes_online_df = merged_df[(merged_df['rate'] > 4.2) &
                                         (merged_df['votes'] > 500) &
                                         (merged_df['online_order'] == 1)]

# Group by locality and count the number of qualifying restaurants in each area
area_investment = high_rating_votes_online_df.groupby('listed_incity').size()

# Find the locality with the highest number of qualifying restaurants
best_area_to_invest = area_investment.idxmax()

print(f"The area Zomato should invest in based on high rating, votes, and online orders is {best_area_to_invest}.")


The area Zomato should invest in based on high rating, votes, and online orders is Koramangala 7th Block.


**By Sharmila Lakkimsetti**