<a href="https://colab.research.google.com/github/NaithaniAR/ZOMATO-RESTAURANT-CLUSTERING-AND-SENTIMENT-ANALYSIS/blob/main/ZOMATO_RESTAURANT_CLUSTERING_AND_SENTIMENT_ANALYSIS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Problem Statement**

Zomato is an Indian restaurant aggregator and food delivery start-up founded by Deepinder Goyal and Pankaj Chaddah in 2008. Zomato provides information, menus and user-reviews of restaurants, and also has food delivery options from partner restaurants in select cities.

India is quite famous for its diverse multi cuisine available in a large number of restaurants and hotel resorts, which is reminiscent of unity in diversity. Restaurant business in India is always evolving. More Indians are warming up to the idea of eating restaurant food whether by dining outside or getting food delivered. The growing number of restaurants in every state of India has been a motivation to inspect the data to get some insights, interesting facts and figures about the Indian food industry in each city. So, this project focuses on analysing the Zomato restaurant data for each city in India.

The Project focuses on Customers and Company, you have  to analyze the sentiments of the reviews given by the customer in the data and make some useful conclusion in the form of Visualizations. Also, cluster the zomato restaurants into different segments. The data is vizualized as it becomes easy to analyse data at instant. The Analysis also solve some of the business cases that can directly help the customers finding the Best restaurant in their locality and for the company to grow up and work on the fields they are currently lagging in.

This could help in clustering the restaurants into segments. Also the data has valuable information around cuisine and costing which can be used in cost vs. benefit analysis

Data could be used for sentiment analysis. Also the metadata of reviewers can be used for identifying the critics in the industry. 

# **Attribute Information**

## **Zomato Restaurant names and Metadata**
Use this dataset for clustering part

1. Name : Name of Restaurants

2. Links : URL Links of Restaurants

3. Cost : Per person estimated Cost of dining

4. Collection : Tagging of Restaurants w.r.t. Zomato categories

5. Cuisines : Cuisines served by Restaurants

6. Timings : Restaurant Timings

## **Zomato Restaurant reviews**
Merge this dataset with Names and Matadata and then use for sentiment analysis part

1. Restaurant : Name of the Restaurant

2. Reviewer : Name of the Reviewer

3. Review : Review Text

4. Rating : Rating Provided by Reviewer

5. MetaData : Reviewer Metadata - No. of Reviews and followers

6. Time: Date and Time of Review

7. Pictures : No. of pictures posted with review

# Importing all the important Librarys and Data set

In [295]:
#importing all the important librarys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math
import time
#from fuzzywuzzy import process, fuzz

In [296]:
#To suppres scientific notation.
pd.options.display.float_format = '{:.2f}'.format 

pd.set_option('display.max_columns', None)

In [297]:
# Mounting the Google Drive folders to google colab notebook
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [298]:
meta_df = pd.read_csv('https://raw.githubusercontent.com/NaithaniAR/ZOMATO-RESTAURANT-CLUSTERING-AND-SENTIMENT-ANALYSIS/main/Zomato%20Restaurant%20names%20and%20Metadata.csv')
reviews_df = pd.read_csv('https://raw.githubusercontent.com/NaithaniAR/ZOMATO-RESTAURANT-CLUSTERING-AND-SENTIMENT-ANALYSIS/main/Zomato%20Restaurant%20reviews.csv')

---
# Dataset inspection
---

## Meta Data

In [299]:
# to get the first five rows of the data set 
meta_df.head()

Unnamed: 0,Name,Links,Cost,Collections,Cuisines,Timings
0,Beyond Flavours,https://www.zomato.com/hyderabad/beyond-flavou...,800,"Food Hygiene Rated Restaurants in Hyderabad, C...","Chinese, Continental, Kebab, European, South I...","12noon to 3:30pm, 6:30pm to 11:30pm (Mon-Sun)"
1,Paradise,https://www.zomato.com/hyderabad/paradise-gach...,800,Hyderabad's Hottest,"Biryani, North Indian, Chinese",11 AM to 11 PM
2,Flechazo,https://www.zomato.com/hyderabad/flechazo-gach...,1300,"Great Buffets, Hyderabad's Hottest","Asian, Mediterranean, North Indian, Desserts","11:30 AM to 4:30 PM, 6:30 PM to 11 PM"
3,Shah Ghouse Hotel & Restaurant,https://www.zomato.com/hyderabad/shah-ghouse-h...,800,Late Night Restaurants,"Biryani, North Indian, Chinese, Seafood, Bever...",12 Noon to 2 AM
4,Over The Moon Brew Company,https://www.zomato.com/hyderabad/over-the-moon...,1200,"Best Bars & Pubs, Food Hygiene Rated Restauran...","Asian, Continental, North Indian, Chinese, Med...","12noon to 11pm (Mon, Tue, Wed, Thu, Sun), 12no..."


In [300]:
# to get the information about the data
meta_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Name         105 non-null    object
 1   Links        105 non-null    object
 2   Cost         105 non-null    object
 3   Collections  51 non-null     object
 4   Cuisines     105 non-null    object
 5   Timings      104 non-null    object
dtypes: object(6)
memory usage: 5.0+ KB


In [301]:
# finding the count of null values
meta_df.isnull().sum()

Name            0
Links           0
Cost            0
Collections    54
Cuisines        0
Timings         1
dtype: int64

In [302]:
# to get the description of the data
meta_df.describe().transpose()

Unnamed: 0,count,unique,top,freq
Name,105,105,Beyond Flavours,1
Links,105,105,https://www.zomato.com/hyderabad/beyond-flavou...,1
Cost,105,29,500,13
Collections,51,42,Food Hygiene Rated Restaurants in Hyderabad,4
Cuisines,105,92,"North Indian, Chinese",4
Timings,104,77,11 AM to 11 PM,6


In [303]:
#checking for duplicate entries
print('duplicates in entries = ',len(meta_df)-len(meta_df.drop_duplicates()))
print('duplicates in Restaurant Name = ',len(meta_df['Name'])-len(meta_df['Name'].drop_duplicates()))

duplicates in entries =  0
duplicates in Restaurant Name =  0


**Web Scraping**

There is a column with links to the zomato resturaunt of which the data is given. [Here ](https://colab.research.google.com/drive/1_l92E1d286rR8IJs3kzvq_TZy5wIOGV4#scrollTo=xwxi9maFzvg_) we will scrape data from zomato and update our meta data. 


In [304]:
# reading the dataframe from the csv file again to import the scraped data from the already prepared csv file

meta_df = pd.read_csv('https://raw.githubusercontent.com/NaithaniAR/ZOMATO-RESTAURANT-CLUSTERING-AND-SENTIMENT-ANALYSIS/main/new_meta.csv')

## Reviews

In [305]:
# to get the first five rows of the data set 
reviews_df.head()

Unnamed: 0,Restaurant,Reviewer,Review,Rating,Metadata,Time,Pictures
0,Beyond Flavours,Rusha Chakraborty,"The ambience was good, food was quite good . h...",5,"1 Review , 2 Followers",5/25/2019 15:54,0
1,Beyond Flavours,Anusha Tirumalaneedi,Ambience is too good for a pleasant evening. S...,5,"3 Reviews , 2 Followers",5/25/2019 14:20,0
2,Beyond Flavours,Ashok Shekhawat,A must try.. great food great ambience. Thnx f...,5,"2 Reviews , 3 Followers",5/24/2019 22:54,0
3,Beyond Flavours,Swapnil Sarkar,Soumen das and Arun was a great guy. Only beca...,5,"1 Review , 1 Follower",5/24/2019 22:11,0
4,Beyond Flavours,Dileep,Food is good.we ordered Kodi drumsticks and ba...,5,"3 Reviews , 2 Followers",5/24/2019 21:37,0


In [306]:
# to get the information about the data
reviews_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Restaurant  10000 non-null  object
 1   Reviewer    9962 non-null   object
 2   Review      9955 non-null   object
 3   Rating      9962 non-null   object
 4   Metadata    9962 non-null   object
 5   Time        9962 non-null   object
 6   Pictures    10000 non-null  int64 
dtypes: int64(1), object(6)
memory usage: 547.0+ KB


In [307]:
# finding the count of null values
reviews_df.isnull().sum()

Restaurant     0
Reviewer      38
Review        45
Rating        38
Metadata      38
Time          38
Pictures       0
dtype: int64

# Analysis 

Defined Functions 


In [308]:
def Row_lis(DataFrame,Column_name):

 '''
 Convert a Row in a data frame into a list of unique elements 

 '''
 a = [Column_name]

 #Dropping na values from the dataframe 

 non_na= DataFrame.dropna()

 # Getting values into a list  
 arr = non_na[a].to_numpy()


 '''
 Reducing the dimensionality of a list 
 '''
 flat_ls = []
 for i in arr:
  for j in i:
    for k in j:
     flat_ls.append(k)

 '''
 Spliting the remaing strings 

 '''

 mylis=Elements=[s.strip() for sub in flat_ls for s in sub.split(',') if s]
 Elements=mylis
 print('List size:', len(Elements) )
 Elements = pd.DataFrame(Elements, columns = [a])

 
 return Elements,mylis



def Row_lis2(DataFrame,Column_name):

 '''
 Convert a Row in a data frame into a list of unique elements 

 '''
 a = [Column_name]

 #Dropping na values from the dataframe 

 non_na= DataFrame.dropna()

 # Getting values into a list  
 arr = non_na[a].to_numpy()


 '''
 Reducing the dimensionality of a list 
 '''
 flat_ls = []
 for i in arr:
  for j in i:
   flat_ls.append(j)

 '''
 Spliting the remaing strings 

 '''
 
 mylis=[s.strip() for sub in flat_ls for s in sub.split(',') if s]

 Elements=mylis
 print('List size:', len(Elements) )
 Elements = pd.DataFrame(Elements, columns = [a])

 
 return Elements,mylis

def Row_lis3(DataFrame,Column_name):

 '''
 Convert a Row in a data frame into a list of unique elements 

 '''
 a = [Column_name]
 '''
 Dropping na values from the dataframe 
 '''
 #non_na= DataFrame.dropna()


 ''' 
 find the index no
 '''
 #b = DataFrame.columns.get_loc(a)
 '''
 Getting values into a Srting
 '''

 x=0
 Elements=''
 for x in range(0,len(meta_df)):
  b= str(meta_df.iloc[x][9]).replace('[','').replace(']','').replace('\'','').replace('\"','')
  Elements=Elements + b
  x+=1

 '''
 Spliting the remaing strings 

 '''

 mylis= Elements.split(',')
 Elements = mylis

 print('List size:', len(Elements) )

 '''
 Converting the list into Pandas dataframe.
 '''
 Elements = pd.DataFrame(Elements, columns = [a])

 
 return Elements,mylis



'''''
Unuique list 
'''''


def unique_list(input_list):
     output_list = []
     for word in input_list:
         if word not in input_list:
            output_list = [word]
            return output_list

'''
  This function takes a dictionary of mapping with keys as the charecters to be replaced in a string and
  the values as the characters to be replaced with
'''

def multiple_str_replaces(org_str,maps):

 for l,r in maps.items():
   org_str = org_str.replace(l,r)
   return org_str





## Meta Data 

Let us first explore the names of the column 

In [309]:
meta_df.columns

Index(['Unnamed: 0', 'Name', 'Links', 'Cost', 'Collections', 'Cuisines',
       'Timings', 'latitude', 'longitude', 'additional_services',
       'Has_Featured', 'known_for', 'status', 'Popular_Dishes'],
      dtype='object')

In [310]:
meta_df = meta_df.drop(['Unnamed: 0'], axis =1)

Q. Arrange the resturants in the df wrt. cost of one person.

In [311]:
# Changing the Data Type of the 'Cost'

meta_df['Cost'] = meta_df['Cost'].str.replace(",","").astype('int64')

In [312]:
meta_df.sort_values(by='Cost',ascending=False,inplace=True)


In [313]:
meta_df.reset_index(inplace=True)

In [314]:
meta_df[['Cost', 'Name', 'Popular_Dishes']]

Unnamed: 0,Cost,Name,Popular_Dishes
0,2800,Collage - Hyatt Hyderabad Gachibowli,"Sushi, Sunday Brunch, Pancakes, Salads, Desert..."
1,2500,Feast - Sheraton Hyderabad Hotel,"Sushi, Dal Makhni, Pastries, Mocktails, Cheese..."
2,1900,Jonathan's Kitchen - Holiday Inn Express & Suites,"Focaccia Al Pollo, Involtini Di Pollo, Jumbo P..."
3,1900,10 Downing Street,"Chicken Burritos, Wheat Beer, Brewed Beer, Chi..."
4,1800,Cascade - Radisson Hyderabad Hitec City,"Croissant, Dinner Buffet, Sunday Brunch, Deser..."
...,...,...,...
100,200,Momos Delight,
101,200,Hunger Maggi Point,
102,200,Sweet Basket,"Barfi, Bengali Sweets, Raj Kachori, Jalebi, Pa..."
103,150,Mohammedia Shawarma,


Five most Expensive resturaunts in the df. 
1.             Collage - Hyatt Hyderabad Gachibowli

2.             Feast - Sheraton Hyderabad Hotel	

3.   Jonathan's Kitchen - Holiday Inn Express & Suites

4.                    10 Downing Street

5.                Cascade - Radisson Hyderabad Hitec City	


Cheapest resturaunts in the df
1.                                    Mohammedia Shawarma
2.                                                  Amul
3.                                         Asian Meal Box
4.                                          Sweet Basket
5.                                              KS Bakers

In [315]:
meta_df['Collections'].value_counts()

Food Hygiene Rated Restaurants in Hyderabad                                                                                                       4
New on Gold                                                                                                                                       2
Great Buffets                                                                                                                                     2
Hyderabad's Hottest                                                                                                                               2
Veggie Friendly                                                                                                                                   2
Trending This Week                                                                                                                                2
Pan-Asian Delicacies                                                                                            

In [316]:
meta_df['known_for'].value_counts()

Classy Ambience, Sophisticated, Worth the Price, Brunch, Appetizers, Breakfast                                                 1
Beautiful Interiors, Comfortable Seating Area, Excellent Ambience, Quick Delivery, Preparation, Portion Size                   1
Courteous Staff, Good Ambience, Main Course, Food Quality, Great Food, Tasty                                                   1
Good Space, Interesting Menu, Breakfast Options, Nice Menu, Great Price, Pocket Friendly Place                                 1
Healthy Food, Meals, Packing, Excellent Food, Quantity, Menu                                                                   1
                                                                                                                              ..
Chinese Restaurant, Comfortable Seating, Fast Service, Decor, Nice Ambience, Reasonable Price                                  1
Valet Service, Good for Large Groups, Excellent Food and Service, Family Place, Ample Seating Are

In [317]:
a,b=Row_lis2(meta_df,'known_for')
a.value_counts()

List size: 276


(known_for,)         
Beautiful View           5
Ample Seating Area       4
Worth the Price          4
Good for Large Groups    4
Rooftop Ambience         4
                        ..
Affordable Prices        1
Great Lighting           1
Great Host               1
Great Hospitality        1
Yummy Food               1
Length: 193, dtype: int64

In [318]:
b=unique_list(b)

In [319]:
meta_df['Cuisines'].value_counts()

North Indian, Chinese                        4
North Indian                                 3
Fast Food                                    2
South Indian, North Indian, Chinese          2
Biryani, North Indian, Chinese               2
                                            ..
Asian, Thai, Chinese, Sushi, Momos           1
North Indian, Mughlai                        1
North Indian, Mediterranean, European        1
Kebab, Continental, Italian, North Indian    1
Street Food, Arabian                         1
Name: Cuisines, Length: 92, dtype: int64

In [320]:
a,c=Row_lis2(meta_df,'Cuisines')
a.value_counts()

List size: 159


(Cuisines,)  
North Indian     32
Chinese          22
Continental      14
Italian          12
Asian            10
South Indian      5
Desserts          5
Kebab             4
American          4
Mediterranean     4
Biryani           4
Bakery            3
European          3
Mughlai           3
Salad             3
Modern Indian     2
Thai              2
Japanese          2
Sushi             2
Hyderabadi        2
Andhra            2
Seafood           2
Finger Food       2
Beverages         2
Mexican           1
Street Food       1
Ice Cream         1
Malaysian         1
Juices            1
Indonesian        1
Healthy Food      1
Goan              1
Fast Food         1
Cafe              1
BBQ               1
Arabian           1
Wraps             1
dtype: int64

In [321]:
c= unique_list(c)

Exploring Time Column 

In [322]:
meta_df['Timings']

0                                     24 Hours (Mon-Sun)
1      6:30am to 10:30am, 12:30pm to 3pm, 7pm to 11pm...
2                        11:30 AM to 3 PM, 7 PM to 11 PM
3                                 12 Noon to 12 Midnight
4                                               24 Hours
                             ...                        
100                                 6pm to 2am (Mon-Sun)
101                                   4:30 PM to 5:30 AM
102    10 AM to 10 PM (Mon-Thu), 8 AM to 10:30 PM (Fr...
103                                         1 PM to 1 AM
104                                        10 AM to 5 AM
Name: Timings, Length: 105, dtype: object

In [323]:
meta_df['additional_services']

0      ['Breakfast', 'Home Delivery', 'Takeaway Avail...
1      ['Breakfast', 'Home Delivery', 'Takeaway Avail...
2      ['Home Delivery', 'Serves Alcohol', 'Indoor Se...
3      ['Home Delivery', 'Takeaway Available', 'Full ...
4      ['Breakfast', 'Home Delivery', 'Valet Parking ...
                             ...                        
100    ['Home Delivery', 'Takeaway Available', 'Stand...
101    ['Breakfast', 'Home Delivery', 'Takeaway Avail...
102    ['Breakfast', 'Home Delivery', 'Takeaway Avail...
103                                                   []
104    ['Home Delivery', 'Takeaway Available', 'Veget...
Name: additional_services, Length: 105, dtype: object

Q. What are the most comman services offered by the resturants ? 

In [324]:
a,d= Row_lis3(meta_df,'additional_services')
a.value_counts()

List size: 457


(additional_services,)        
 Takeaway Available               66
 Indoor Seating                   47
 Table booking recommended        23
 Full Bar Available               21
 Valet Parking Available          21
                                  ..
 Outdoor SeatingHome Delivery      1
 Parking                           1
 Pet Friendly                      1
 Private Dining Area Available     1
Breakfast                          1
Length: 96, dtype: int64

 **The most recurring features in the resturaunt are**
 
 Takeaway Available               
 Indoor Seating                   
 Table booking recommended        
 Valet Parking Available          
 Full Bar Available              

In [325]:
d= unique_list(d)

Q. What is the count of featured resturaunts in the DataFrame

In [326]:
meta_df['Has_Featured'].value_counts()


0.00    79
1.00    26
Name: Has_Featured, dtype: int64

Q. How many resturaunts have closed for business? 

In [327]:
meta_df['status'].value_counts()

Open For Business     81
Permanently Closed    24
Name: status, dtype: int64

In [328]:
meta_df['Popular_Dishes'].value_counts()

Sushi, Sunday Brunch, Pancakes, Salads, Deserts, Sauce                                                                      1
Chocolate Icecreams, Firni, Sweet Pan, Authentic Hyderabadi Biryani, Lemon Chicken, Falooda                                 1
Dhaba Chicken Curry, Veg Galouti Kebab, Amritsari Kulche, Mix Vegetable, Authentic North Indian Food, Mutton Combo          1
Naga Chilli Pork, Pork Momo, Bibimbap, Lemon Grass Chicken, Teriyaki Chicken, Pork Ribs                                     1
Murgh Kalmi Kebab, Murgh Mussalam, Yakhni Shorba, Tandoori Jhinga, Chicken Lolipop, Authentic Hyderabadi Food               1
                                                                                                                           ..
Dal Kichadi, Tandoori Wings, Pineapple Grill, Hot Gulab Jamun, Panneer Butter Masala, Grilled Fish                          1
Mongolian Chicken, Cocktails, Chilli Chicken, Pasta                                                                   

Q. What are the total number of unique popular dishes which are served ?

In [329]:
a,b=Row_lis2(meta_df,'Popular_Dishes')
a.nunique()

List size: 268


Popular_Dishes    241
dtype: int64

## Reviews 

In [330]:
reviews_df.columns


Index(['Restaurant', 'Reviewer', 'Review', 'Rating', 'Metadata', 'Time',
       'Pictures'],
      dtype='object')

Q. Explore names of resturaunt

In [331]:
reviews_df['Restaurant']

0        Beyond Flavours
1        Beyond Flavours
2        Beyond Flavours
3        Beyond Flavours
4        Beyond Flavours
              ...       
9995    Chinese Pavilion
9996    Chinese Pavilion
9997    Chinese Pavilion
9998    Chinese Pavilion
9999    Chinese Pavilion
Name: Restaurant, Length: 10000, dtype: object

Q. Explore Ratings

In [332]:
# finding the count of null values
reviews_df.isnull().sum()

Restaurant     0
Reviewer      38
Review        45
Rating        38
Metadata      38
Time          38
Pictures       0
dtype: int64

We see that rating has 38 null values 

In [333]:
reviews_df['Rating'].describe()

count     9962
unique      10
top          5
freq      3832
Name: Rating, dtype: object

In [334]:
reviews_df['Rating'].unique()

array(['5', '4', '1', '3', '2', '3.5', '4.5', '2.5', '1.5', 'Like', nan],
      dtype=object)

In [335]:
Test_df=reviews_df

In [336]:
# Imputing  'like'  to np.nan

Test_df.loc[Test_df['Rating'] == 'Like'] = np.nan

# Changing Data type of the dataSet
Test_df['Rating'] = Test_df['Rating'].astype('float64')
print(Test_df['Rating'].describe())


count   9961.00
mean       3.60
std        1.48
min        1.00
25%        3.00
50%        4.00
75%        5.00
max        5.00
Name: Rating, dtype: float64


Ratings column has 'Like' as its input which is unacceptable, let us impute it with 50% value of the Data set and nan values with 25% .

In [337]:
# Imputing  'like'  and nan values 

reviews_df.loc[reviews_df['Rating'] == 'Like'] = 4.00

reviews_df.loc[reviews_df['Rating'] == np.nan] = 3.00

# Changing Data type of the dataSet
reviews_df['Rating'] = reviews_df['Rating'].astype('float64')




Exploring meta data. 

In [338]:
reviews_df['Metadata']

0             1 Review , 2 Followers
1            3 Reviews , 2 Followers
2            2 Reviews , 3 Followers
3              1 Review , 1 Follower
4            3 Reviews , 2 Followers
                    ...             
9995       53 Reviews , 54 Followers
9996        2 Reviews , 53 Followers
9997      65 Reviews , 423 Followers
9998      13 Reviews , 144 Followers
9999    472 Reviews , 1302 Followers
Name: Metadata, Length: 10000, dtype: object

In [339]:
'''
Splitting MetaData into Followers and reviews 
'''

reviews_df['Reviews'],reviews_df['Followers']=reviews_df['Metadata'].str.split(',').str

'''
Conveting the newly created columns into Followers and Reviews 
'''
reviews_df['Reviews'] = pd.to_numeric(reviews_df['Reviews'].str.split(' ').str[0])
reviews_df['Followers'] = pd.to_numeric(reviews_df['Followers'].str.split(' ').str[1])

'''
Dropping the Reviews Column 
'''

reviews_df = reviews_df.drop(['Metadata'], axis =1)

  """


Exploring Time. 

In [340]:
#Splitting time into various columns. 

'''
comverting time to DateTime format
'''
reviews_df['Time']=pd.to_datetime(reviews_df['Time'])

'''
Creating new columns based on DateTime
'''

reviews_df['Year'] = pd.DatetimeIndex(reviews_df['Time']).year
reviews_df['Month'] = pd.DatetimeIndex(reviews_df['Time']).month
reviews_df['Hour'] = pd.DatetimeIndex(reviews_df['Time']).hour

'''
Dropping time column 
'''
reviews_df = reviews_df.drop(['Time'], axis =1)

Checking for null values

In [341]:
reviews_df.isna().sum()

Restaurant       1
Reviewer        39
Review          46
Rating          39
Pictures         1
Reviews         39
Followers     1617
Year            39
Month           39
Hour            39
dtype: int64

In [342]:
'''
Rows in which there are more than two null values 
'''

x= reviews_df.iloc[reviews_df[(reviews_df.isnull().sum(axis=1) >= 2)].index]

'''Storing index of this df'''
y= x.index

reviews_df.iloc[reviews_df[(reviews_df.isnull().sum(axis=1) >= 2)].index]


Unnamed: 0,Restaurant,Reviewer,Review,Rating,Pictures,Reviews,Followers,Year,Month,Hour
2360,Amul,Lakshmi Narayana,,5.0,0.0,0.0,,2018.0,7.0,18.0
6449,Hyderabad Chefs,Madhurimanne97,,5.0,0.0,1.0,,2018.0,7.0,16.0
6489,Hyderabad Chefs,Harsha,,5.0,0.0,1.0,,2018.0,7.0,21.0
7601,,,,,,,,,,
8228,Al Saba Restaurant,Suresh,,5.0,0.0,1.0,,2018.0,7.0,22.0
8777,American Wild Wings,,,,0.0,,,,,
8778,American Wild Wings,,,,0.0,,,,,
8779,American Wild Wings,,,,0.0,,,,,
8780,American Wild Wings,,,,0.0,,,,,
8781,American Wild Wings,,,,0.0,,,,,


Most values in this row is null hence we will drop them 


In [343]:
reviews_df.drop(y, inplace = True)

In [355]:
meta_df.isna().sum()

index                   0
Name                    0
Links                   0
Cost                    0
Collections            54
Cuisines                0
Timings                 0
latitude                0
longitude               0
additional_services     0
Has_Featured            0
known_for              10
status                  0
Popular_Dishes         24
dtype: int64

Imputing the values with zero as it is quite possible that the user has no followers and zero reviews 

In [353]:
''' Imputing nan values with 0 '''

reviews_df.loc[reviews_df['Review'].isna()] = 0

reviews_df.loc[reviews_df['Followers'].isna()] = 0