# zomato Preference Analysis

Dataset Link:https://www.kaggle.com/datasets/rrkcoder/zomato-data-40k-restaurants-of-indias-100-cities/download?datasetVersionNumber=1


In [1]:
import pandas as pd
import numpy as np

In [2]:
#- Read the Zomato dataset containing information about restaurants, ratings, cuisines, and related details to perform user preference analysis.
data=pd.read_csv(dataset_link,index_col=False)

In [3]:
# Converting Data to Pandas DataFrame and Displaying Information

## Objective:
#- Convert the loaded Zomato dataset into a pandas DataFrame for efficient data manipulation and analysis.
#- Display essential information about the DataFrame.

df=pd.DataFrame(data)
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44891 entries, 0 to 44890
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Restaurant Name        44891 non-null  object
 1   Rating                 44891 non-null  object
 2   Cuisine                44872 non-null  object
 3   Average Price          44891 non-null  object
 4   Average Delivery Time  44891 non-null  object
 5   Safety Measure         44891 non-null  object
 6   Location               44891 non-null  object
dtypes: object(7)
memory usage: 2.4+ MB
None


In [4]:
#Checking for Missing Values in the Zomato DataFrame

## Objective:
#- Examine the dataset for missing values to assess data integrity and plan for data cleaning if necessary.

print(df.isnull().sum())

Restaurant Name           0
Rating                    0
Cuisine                  19
Average Price             0
Average Delivery Time     0
Safety Measure            0
Location                  0
dtype: int64


In [5]:
# Displaying the Initial Rows of the Zomato DataFrame

## Objective:
#- Provide a preview of the dataset by displaying the first few rows of the Zomato DataFrame.

print(df.head())

                       Restaurant Name Rating  \
0                        Campus Bakers    4.3   
1       Mama Chicken Mama Franky House      4   
2     GMB - Gopika Sweets & Restaurant    4.2   
3  Shree Bankey Bihari Misthan Bhandar    4.2   
4                          Burger King    4.2   

                                             Cuisine Average Price  \
0         Bakery, Fast Food, Pizza, Sandwich, Burger   ₹50 for one   
1        North Indian, Mughlai, Rolls, Burger, Momos   ₹50 for one   
2  North Indian, South Indian, Chinese, Fast Food...   ₹50 for one   
3  Mithai, Street Food, South Indian, Chinese, Ic...   ₹50 for one   
4                       Burger, Fast Food, Beverages   ₹50 for one   

  Average Delivery Time                                     Safety Measure  \
0                36 min            Restaurant partner follows WHO protocol   
1                22 min  Follows all Max Safety measures to ensure your...   
2                27 min  Follows all Max Safety me

In [6]:
# Displaying Column Names of the Zomato DataFrame

## Objective:
#- Obtain a list of column names in the Zomato DataFrame for reference and further analysis.

df.columns

Index(['Restaurant Name', 'Rating', 'Cuisine', 'Average Price',
       'Average Delivery Time', 'Safety Measure', 'Location'],
      dtype='object')

In [7]:
# Filtering Zomato DataFrame based on Rating Values

## Objective:
#- Remove rows from the Zomato DataFrame where the 'Rating' column contains specific values from the given list.

list=[" ","_","New","-"]
df=df[df.Rating.isin(list)==False]
print(df)

                              Restaurant Name Rating  \
0                               Campus Bakers    4.3   
1              Mama Chicken Mama Franky House      4   
2            GMB - Gopika Sweets & Restaurant    4.2   
3         Shree Bankey Bihari Misthan Bhandar    4.2   
4                                 Burger King    4.2   
...                                       ...    ...   
44873    Country Oven Multi Cusine Restaurant    3.7   
44874                    KAKATIYA Grand Udupi    4.1   
44876  S K Point A Complete Family Restaurant    4.3   
44878              Madhuram Tiffins And Meals    4.1   
44883                        MS Tiffin Center    4.0   

                                                 Cuisine Average Price  \
0             Bakery, Fast Food, Pizza, Sandwich, Burger   ₹50 for one   
1            North Indian, Mughlai, Rolls, Burger, Momos   ₹50 for one   
2      North Indian, South Indian, Chinese, Fast Food...   ₹50 for one   
3      Mithai, Street Food, Sou

In [8]:
# Filtering Zomato DataFrame to Exclude Rows with Null Cuisine Values

## Objective:
#- Remove rows from the Zomato DataFrame where the 'Cuisine' column contains null (missing) values.

df=df[df["Cuisine"].isnull()==False]

In [9]:
# Displaying Rows with Null Values in the 'Location' Column

## Objective:
#- Print the rows in the Zomato DataFrame where the 'Location' column contains null (missing) values.

print(df[df["Location"].isnull()==True])

Empty DataFrame
Columns: [Restaurant Name, Rating, Cuisine, Average Price, Average Delivery Time, Safety Measure, Location]
Index: []


In [10]:
print(df)

                              Restaurant Name Rating  \
0                               Campus Bakers    4.3   
1              Mama Chicken Mama Franky House      4   
2            GMB - Gopika Sweets & Restaurant    4.2   
3         Shree Bankey Bihari Misthan Bhandar    4.2   
4                                 Burger King    4.2   
...                                       ...    ...   
44873    Country Oven Multi Cusine Restaurant    3.7   
44874                    KAKATIYA Grand Udupi    4.1   
44876  S K Point A Complete Family Restaurant    4.3   
44878              Madhuram Tiffins And Meals    4.1   
44883                        MS Tiffin Center    4.0   

                                                 Cuisine Average Price  \
0             Bakery, Fast Food, Pizza, Sandwich, Burger   ₹50 for one   
1            North Indian, Mughlai, Rolls, Burger, Momos   ₹50 for one   
2      North Indian, South Indian, Chinese, Fast Food...   ₹50 for one   
3      Mithai, Street Food, Sou

In [11]:
# Extracting Numerical Values from 'Average Price' Column

## Objective:
#- Extract numerical values from the 'Average Price' column, assuming values are prefixed with the Indian Rupee symbol (₹).

df["Average Price"]=df["Average Price"].str.extract(r'\₹(\d+)', expand=False)
print(df)

                              Restaurant Name Rating  \
0                               Campus Bakers    4.3   
1              Mama Chicken Mama Franky House      4   
2            GMB - Gopika Sweets & Restaurant    4.2   
3         Shree Bankey Bihari Misthan Bhandar    4.2   
4                                 Burger King    4.2   
...                                       ...    ...   
44873    Country Oven Multi Cusine Restaurant    3.7   
44874                    KAKATIYA Grand Udupi    4.1   
44876  S K Point A Complete Family Restaurant    4.3   
44878              Madhuram Tiffins And Meals    4.1   
44883                        MS Tiffin Center    4.0   

                                                 Cuisine Average Price  \
0             Bakery, Fast Food, Pizza, Sandwich, Burger            50   
1            North Indian, Mughlai, Rolls, Burger, Momos            50   
2      North Indian, South Indian, Chinese, Fast Food...            50   
3      Mithai, Street Food, Sou

In [12]:
# Filtering Zomato DataFrame Based on 'Average Delivery Time'

## Objective:
#- Remove rows from the Zomato DataFrame where the 'Average Delivery Time' column starts with specific values from the given list.

list=["Currently not accepting orders","Opens"]
df=df[df["Average Delivery Time"].str.startswith("Opens")==False]

In [13]:
print(df)

                           Restaurant Name Rating  \
0                            Campus Bakers    4.3   
1           Mama Chicken Mama Franky House      4   
2         GMB - Gopika Sweets & Restaurant    4.2   
3      Shree Bankey Bihari Misthan Bhandar    4.2   
4                              Burger King    4.2   
...                                    ...    ...   
44832               The Belgian Waffle Co.    4.5   
44843                        Sneha Tiffins    4.3   
44844                          Chop Sticks    3.7   
44853                 KAKATIYA Grand Hotel    4.2   
44858                     Peacock FastFood    4.0   

                                                 Cuisine Average Price  \
0             Bakery, Fast Food, Pizza, Sandwich, Burger            50   
1            North Indian, Mughlai, Rolls, Burger, Momos            50   
2      North Indian, South Indian, Chinese, Fast Food...            50   
3      Mithai, Street Food, South Indian, Chinese, Ic...           

In [14]:
# Further Filtering Zomato DataFrame Based on 'Average Delivery Time'

## Objective:
#- Continue refining the Zomato DataFrame by removing rows where the 'Average Delivery Time' column starts with specific phrases.


df=df[df["Average Delivery Time"].str.startswith("Currently")==False]
df=df[df["Average Delivery Time"].str.startswith("Temporarily")==False]
df=df[df["Average Delivery Time"].str.startswith("Opening")==False]

In [15]:
df=df[df["Average Delivery Time"].str.startswith("Closes")==False]

In [18]:
# Extracting and Filtering 'Average Delivery Time' Values

## Objective:
#- Extract numerical values from the 'Average Delivery Time' column and filter the DataFrame to include only rows where the delivery time is equal to "10".

df["Average Delivery Time"]=df["Average Delivery Time"].str.extract(r'(\d+)\s?\w*', expand=False)
print(df[df["Average Delivery Time"]=="10"])

                                        Restaurant Name Rating  \
1109                    Classic Derani Jethani Icecream    4.4   
1307                               Mamta Cake And Bakes    4.1   
3120                      New Brijwasi Fast Food Corner    4.0   
5304                           Rasmalai Sweet & Namkeen    4.1   
5369                                          Cake Walk    4.2   
13334                                             ibaco    4.4   
13423                                        Lassi Shop    3.9   
16241                                      Ravi Alpahar    4.3   
26153                             Ludhiana Egg Parantha    3.9   
27223                                    NIC Ice Creams    4.5   
27226                            Crave Desserts & Bakes    4.2   
37216                                   Arun Ice Creams    4.5   
37224  Kwality Wall’s Frozen Dessert and Ice Cream Shop    4.3   
38563                              Sea Rock Chaat House    3.5   
39820     

In [19]:
# Counting Null Values in the 'Safety Measure' Column

## Objective:
#- Determine and print the number of null (missing) values in the 'Safety Measure' column.

print(df["Safety Measure"].isnull().sum())

0


In [20]:
# Categorizing 'Safety Measure' Values

## Objective:
#- Categorize values in the 'Safety Measure' column based on whether they start with "Follows" or not.

def rep(x):
    if(x.startswith("Follows")==True):
        return "safety"
    else:
        return "WHO"
            
df["Safety Measure"]=df["Safety Measure"].apply(lambda x:rep(x))
print(df)

                           Restaurant Name Rating  \
0                            Campus Bakers    4.3   
1           Mama Chicken Mama Franky House      4   
2         GMB - Gopika Sweets & Restaurant    4.2   
3      Shree Bankey Bihari Misthan Bhandar    4.2   
4                              Burger King    4.2   
...                                    ...    ...   
44804         RRR Multi Cuisine Restaurant    4.0   
44806                   Puliyogare Company    2.9   
44810                         Sultan Mandi    3.2   
44813                   Candy Eats &Treats    3.8   
44814                         Mandi Darbar    3.7   

                                                 Cuisine Average Price  \
0             Bakery, Fast Food, Pizza, Sandwich, Burger            50   
1            North Indian, Mughlai, Rolls, Burger, Momos            50   
2      North Indian, South Indian, Chinese, Fast Food...            50   
3      Mithai, Street Food, South Indian, Chinese, Ic...           

In [21]:
# Counting Null Values in the 'Location' Column

## Objective:
#- Determine and print the number of null (missing) values in the 'Location' column.

print(df["Location"].isnull().sum())

0


In [23]:
print(df.dtypes)

Restaurant Name          object
Rating                   object
Cuisine                  object
Average Price            object
Average Delivery Time    object
Safety Measure           object
Location                 object
dtype: object


In [24]:
 df=df[df['Average Price'].isnull()==False]

In [25]:
# Counting Null Values in the 'Average Delivery Time' Column

## Objective:
# Determine and print the number of null (missing) values in the 'Average Delivery Time' column.

print(df['Average Delivery Time'].isnull().values.sum())

7


In [26]:
df=df[df['Average Delivery Time'].isnull()==False]

In [27]:
print(df['Average Delivery Time'].isnull().values.sum())

0


In [28]:
print(df.isnull().values.any())

False


In [29]:
print(df[df["Average Delivery Time"]=="10"])

                                        Restaurant Name Rating  \
1109                    Classic Derani Jethani Icecream    4.4   
1307                               Mamta Cake And Bakes    4.1   
3120                      New Brijwasi Fast Food Corner    4.0   
5304                           Rasmalai Sweet & Namkeen    4.1   
5369                                          Cake Walk    4.2   
13334                                             ibaco    4.4   
13423                                        Lassi Shop    3.9   
16241                                      Ravi Alpahar    4.3   
26153                             Ludhiana Egg Parantha    3.9   
27223                                    NIC Ice Creams    4.5   
27226                            Crave Desserts & Bakes    4.2   
37216                                   Arun Ice Creams    4.5   
37224  Kwality Wall’s Frozen Dessert and Ice Cream Shop    4.3   
38563                              Sea Rock Chaat House    3.5   
39820     

In [30]:
# Displaying the Dimensions of the DataFrame

## Objective:
#- Print the number of rows and columns in the DataFrame.
print(df.shape)

(34855, 7)


In [31]:
# Converting Data Types of DataFrame Columns

## Objective:
#- Modify the data types of specific columns in the DataFrame according to a predefined dictionary.

typ={"Restaurant Name":str,"Rating":float,"Cuisine":str,"Average Price":float,"Average Delivery Time":int,"Safety Measure":str,"Location":str}
df=df.astype(typ)
print(df.dtypes)

Restaurant Name           object
Rating                   float64
Cuisine                   object
Average Price            float64
Average Delivery Time      int32
Safety Measure            object
Location                  object
dtype: object


In [32]:
# Displaying Rows with 'Average Delivery Time' Equal to 1

## Objective:
#- Print rows from the DataFrame where the 'Average Delivery Time' column has a value equal to 1.

print(df[df["Average Delivery Time"]==1])

Empty DataFrame
Columns: [Restaurant Name, Rating, Cuisine, Average Price, Average Delivery Time, Safety Measure, Location]
Index: []


In [33]:
print(df['Restaurant Name'].isnull().values.sum())

0


In [34]:
# Calculating and Displaying Correlation Matrix

## Objective:
#- Calculate and print the correlation matrix for numeric columns in the DataFrame.

print(df.corr())

                         Rating  Average Price  Average Delivery Time
Rating                 1.000000      -0.004094              -0.101681
Average Price         -0.004094       1.000000               0.060411
Average Delivery Time -0.101681       0.060411               1.000000


  print(df.corr())


In [35]:
#df.plot.bar(x="Average Price",y="Average Delivery Time")

In [36]:
#df.to_csv("zomato_dataset_analysis.csv",index=False)
df['Index'] = range(1, len(df) + 1)

In [37]:
#df.to_csv("zomato_dataset_cusine.csv",index=False,columns=["Index","Restaurant Name","Cuisine"])
#df.to_csv("zomato_FULLDATA.csv",index=False)

In [38]:
df.shape

(34855, 8)

In [39]:
df.iloc[0,2]

'Bakery, Fast Food, Pizza, Sandwich, Burger'

In [40]:
# Extracting Unique Words from 'Cuisine' Column

## Objective:
#- Define a function to extract unique words from the 'Cuisine' column in the DataFrame.

import re
def word_search():
    y=[]
    for i in range(0,34855):
        words=df.iloc[i,2].split(',')
        for w in words:
            if re.search(w.strip(),str(y)):
                pass
            else:
                y.append(w.strip())
    print(y)
    return y
z=word_search()

['Bakery', 'Fast Food', 'Pizza', 'Sandwich', 'Burger', 'North Indian', 'Mughlai', 'Rolls', 'Momos', 'South Indian', 'Chinese', 'Street Food', 'Mithai', 'Ice Cream', 'Desserts', 'Beverages', 'Pasta', 'Shake', 'Biryani', 'Continental', 'Cafe', 'Healthy Food', 'Salad', 'Wraps', 'Waffle', 'Juices', 'Italian', 'Kebab', 'American', 'Tea', 'Hyderabadi', 'Coffee', 'Shawarma', 'Thai', 'Asian', 'Sichuan', 'Lucknowi', 'Lebanese', 'BBQ', 'Seafood', 'Rajasthani', 'Sushi', 'Pancake', 'Korean', 'Mediterranean', 'Mexican', 'Maharashtrian', 'European', 'Oriental', 'Turkish', 'Mishti', 'Gujarati', 'Panini', 'Tibetan', 'Hot dogs', 'Finger Food', 'Kathiyawadi', 'Roast Chicken', 'Cake', 'Japanese', 'Kerala', 'Bubble Tea', 'Grilled Chicken', 'Arabian', 'Middle Eastern', 'Bengali', 'Awadhi', 'Parsi', 'Bar Food', 'Afghan', 'Modern Indian', 'French', 'Bihari', 'Sindhi', 'Paan', 'Saoji', 'Chettinad', 'Steak', 'Burmese', 'Tex-Mex', 'Andhra', 'Naga', 'Indonesian', 'Odia', 'Tamil', 'Mangalorean', 'Vietnamese', 'No

In [41]:
len(z)

122

In [42]:
#cuisine=pd.DataFrame(index=df["Restaurant Name"],columns=z)
#cuisine.to_csv("cusine.csv")
#34855

In [43]:
import re
#def word_search():
 #   for i in range(0,122):
  #      for j in range(0,34928):
   #         words=df.iloc[j,2].split(',')
    #        for w in words:
     #           if re.search(z[i],w.strip()):
      #              cuisine.iloc[j,i]=1
       #             break
        #        else:
         #           cuisine.iloc[j,i]=0
                    
        
#word_search()
#cuisine.to_csv("cusine.csv")

In [44]:
print(df.iloc[1,7])

2


In [45]:
cuisine_1=pd.DataFrame(columns=["Restaurant_Names","Cusine","Index"])
cuisine.to_csv("cusine.csv")
#34855

In [55]:
# Reading Cuisine Images from CSV and Printing Pizza Images

## Objective:
#-- Read cuisine images from a CSV file and print URLs associated with the "Pizza" cuisine.

images_cusine=pd.read_csv("cusine_images.csv")
print(images_cusine)   
for img in images_cusine.index:
    if(images_cusine.iloc[img,1]=="Pizza"):
        print(images_cusine.iloc[img,2])

     Unnamed: 0      Cusine                                             images
0             0      Bakery  https://b.zmtcdn.com/data/o2_assets/65352e5234...
1             1   Fast Food  https://b.zmtcdn.com/data/dish_photos/4bb/b64f...
2             2       Pizza  https://media-assets.swiggy.com/swiggy/image/u...
3             3    Sandwich  https://media-assets.swiggy.com/swiggy/image/u...
4             4      Burger  https://media-assets.swiggy.com/swiggy/image/u...
..          ...         ...                                                ...
117         117     Iranian  https://b.zmtcdn.com/data/o2_assets/f9be378a14...
118         118      German  https://media-assets.swiggy.com/swiggy/image/u...
119         119  Sri Lankan  https://b.zmtcdn.com/data/o2_assets/f9be378a14...
120         120    Moroccan  https://b.zmtcdn.com/data/o2_assets/f9be378a14...
121         121     Russian  https://img.freepik.com/free-photo/traditional...

[122 rows x 3 columns]
https://media-assets.swiggy.

In [58]:
# Searching for Cuisines in 'Cuisine' Column and Creating a New DataFrame

## Objective:
#- Search for specific cuisines in the 'Cuisine' column of the main DataFrame and create a new DataFrame with additional information.

import re
cus=[]
res=[]
ind=[]
imgs=[]
def word_search():
    for i in range(0,122):
        for j in range(0,34855):
            words=df.iloc[j,2].split(',')
            for w in words:
                if re.search(z[i],w.strip()):
                    res.append(df.iloc[j,0])
                    cus.append(w.strip())
                    for img in images_cusine.index:
                        if(images_cusine.iloc[img,1]==w.strip()):
                            imgs.append(images_cusine.iloc[img,2])
                            break
                    ind.append(df.iloc[j,7])
                    break
                    
        
word_search()
cuis=pd.DataFrame({"Restaurant_Names":res,"Cusine":cus,"Index":ind,"Images":imgs})
print(cuis)
cuis.to_csv("cusine_FULLDATA_new.csv")

               Restaurant_Names      Cusine  Index  \
0                 Campus Bakers      Bakery      1   
1              Goverdhan Bakery      Bakery      7   
2                     Cake Wala      Bakery     10   
3                    Cake House      Bakery     13   
4                 Bhagat Halwai      Bakery     15   
...                         ...         ...    ...   
129818  German Bakery Wunderbar      German  27644   
129819            Ahare 8 Khana  Sri Lankan  29987   
129820                 Roastown    Moroccan  31295   
129821                Cofi Club     Russian  31933   
129822          Medovika Bakers     Russian  33597   

                                                   Images  
0       https://b.zmtcdn.com/data/o2_assets/65352e5234...  
1       https://b.zmtcdn.com/data/o2_assets/65352e5234...  
2       https://b.zmtcdn.com/data/o2_assets/65352e5234...  
3       https://b.zmtcdn.com/data/o2_assets/65352e5234...  
4       https://b.zmtcdn.com/data/o2_assets/65352e5

In [59]:
cuis.to_csv("cusine_fulldataset_new.csv")

In [None]:
cusine_iamges=pd.DataFrame({"Cusine":z})
#cusine_iamges.to_csv("cusine_images.csv")

In [None]:
import requests
image=[]
for i in range(0,70):
    word=z[i].replace(" ", "_")
    link="https://media-assets.swiggy.com/swiggy/image/upload/fl_lossy,f_auto,q_auto,w_288,h_360/v1674029859/PC_Creative%20refresh/3D_bau/banners_new/"+word+".png"
    response = requests.head(link)
    print(i)
    if response.status_code != 404:
        image.append(link)
    else:
        image.append(" ")
print(image)        