#                          Zomato Recommendation System

# Context
 I was always fascinated by the food culture of Bengaluru. Restaurants from all over the world can be found 
 here in Bengaluru. From United States to Japan, Russia to Antarctica, you get all type of cuisines here. 
 Delivery, Dine-out, Pubs, Bars, Drinks,Buffet, Desserts you name it and Bengaluru has it.
 Currently which stands at approximately 12,000 restaurants. 
With such an high number of restaurants. This industry hasn't been saturated yet. 
And new restaurants are opening every day. However it has become difficult for them to compete with already established restaurants. 

The key issues that continue to pose a challenge to them include high real estate costs, rising food costs, shortage of quality manpower, fragmented supply chain and over-licensing. This Zomato data aims at analysing demography of the location. Most importantly it will help new restaurants 
in deciding their theme, menus, cuisine, cost etc for a particular location. It also aims at finding similarity between neighborhoods of Bengaluru on the basis of food. The dataset also contains reviews for each of the restaurant which 
will help in finding overall rating for the place.

In this notebook I will try analyzing the Buisness Problem of Zomato and create a practical recommendation system for users.

# What is Recommendation System?

The rapid growth of data collection has led to a new era of information. Data is being used to create more efficient systems and this is where Recommendation Systems come into play. Recommendation Systems are a type of information filtering systems as they improve the quality of search results and provides items that are more relevant to the search item or are realted to the search history of the user. They are active information filtering systems which personalize the information coming to a user based on his interests, relevance of the information etc. Recommender systems are used widely for recommending movies, articles, restaurants, places to visit, items to buy etc.

There are basically three types of recommender systems:-

Demographic Filtering- They offer generalized recommendations to every user, based on movie popularity and/or genre. The System recommends the same movies to users with similar demographic features.

Content Based Filtering- They suggest similar items based on a particular item. This system uses item metadata, such as genre, director, description, actors, etc. for movies, to make these recommendations.

Collaborative Filtering- This system matches persons with similar interests and provides recommendations based on this matching. Collaborative filters do not require item metadata like its content-based counterparts.

Here I will be using Content Based Filtering

Content-Based Filtering: This method uses only information about the description and attributes of the items users has previously consumed to model user's preferences. In other words, these algorithms try to recommend items that are similar to those that a user liked in the past (or is examining in the present). In particular, various candidate items are compared with items previously rated by the user and the best-matching items are recommended.

This data set consists of restaurants of Bangalore,India collected from Zomato.

My aim is to create a content based recommender system in which when I will write a restaurant name, Recommender system will look at the reviews of other restaurants, and System will recommend us other restaurants with similar reviews and sort them from the highest rated.


# Breakdown of this notebook:¶

1.#Loading the dataset: Load the data and import the libraries.
2.# Data Cleaning:
       . Deleting redundant columns.
       . Renaming the columns.
       . Dropping duplicates.
       . Cleaning individual columns.
       . Remove the NaN values from the dataset
        #Some Transformations
3.#Text Preprocessing
    .Cleaning unnecessary words in the reviews
    . Removing links and other unncessary items
    . Removing Symbols
4.#Recommendation System




# Importing Libraries

In [1]:
#Importing Libraries
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import r2_score
import warnings
warnings.filterwarnings('always')
warnings.filterwarnings('ignore')
import re
from nltk.corpus import stopwords
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

# Loading the dataset

In [2]:
#reading the dataset
zomato_real=pd.read_csv("C:/Users/ASHRITHA/Downloads/Zomato_reduced.csv")
zomato_real.head() # prints the first N rows of a DataFrame

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,...,cuisines,approx_cost(for two people),reviews_list,menu_item,listed_in(type),listed_in(city),liked_food_from_review,menus_combined,location_latitude,location_longitude
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,Yes,Yes,4.1,775.0,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,...,"North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari,"['fried rice', 'soya chaap', 'kulcha', 'rice',...","['pasta,', 'lunch', 'buffet,', 'masala', 'papa...",12.915382,77.573638
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,Yes,No,4.1,787.0,080 41714161,Banashankari,Casual Dining,...,"Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari,"['chicken', 'fried rice', 'chicken biryan', 'c...","['momos,', 'lunch', 'buffet,', 'chocolate', 'n...",12.915382,77.573638
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,Yes,No,3.8,918.0,+91 9663487993,Banashankari,"Cafe, Casual Dining",...,"Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari,"['veggies', 'egg', 'pasta', 'cake', 'chocolate...","['churros,', 'cannelloni,', 'minestrone', 'sou...",12.915382,77.573638
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,No,No,3.7,88.0,+91 9620009302,Banashankari,Quick Bites,...,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari,"['pulka', 'apple', 'rice with sambar', 'dosa',...","['masala', 'dosa', 'pulka', 'apple', 'rice wit...",12.915382,77.573638
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,No,No,3.8,166.0,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,...,"North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari,"['roti', 'noodles', 'kulcha', 'pav bhaji', 'pa...","['panipuri,', 'gol', 'gappe', 'roti', 'noodles...",12.941726,77.575502


In [3]:
zomato_real.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9253 entries, 0 to 9252
Data columns (total 21 columns):
url                            9253 non-null object
address                        9253 non-null object
name                           9253 non-null object
online_order                   9253 non-null object
book_table                     9253 non-null object
rate                           9253 non-null float64
votes                          9253 non-null float64
phone                          9116 non-null object
location                       9253 non-null object
rest_type                      9216 non-null object
dish_liked                     4545 non-null object
cuisines                       9250 non-null object
approx_cost(for two people)    9253 non-null float64
reviews_list                   9253 non-null object
menu_item                      9253 non-null object
listed_in(type)                9253 non-null object
listed_in(city)                9253 non-null object
liked_

# Data Cleaning and Feature Engineering¶


In [4]:
#Deleting Unnnecessary Columns
zomato=zomato_real.drop(['url','dish_liked','phone'],axis=1) #Dropping the column "dish_liked", "phone", "url" and saving the new dataset as "zomato"

In [5]:
#Removing the Duplicates
zomato.duplicated().sum()
zomato.drop_duplicates(inplace=True)

In [6]:
#Remove the NaN values from the dataset
zomato.isnull().sum()
zomato.dropna(how='any',inplace=True)
zomato.info() #.info() function is used to get a concise summary of the dataframe

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4529 entries, 0 to 9252
Data columns (total 18 columns):
address                        4529 non-null object
name                           4529 non-null object
online_order                   4529 non-null object
book_table                     4529 non-null object
rate                           4529 non-null float64
votes                          4529 non-null float64
location                       4529 non-null object
rest_type                      4529 non-null object
cuisines                       4529 non-null object
approx_cost(for two people)    4529 non-null float64
reviews_list                   4529 non-null object
menu_item                      4529 non-null object
listed_in(type)                4529 non-null object
listed_in(city)                4529 non-null object
liked_food_from_review         4529 non-null object
menus_combined                 4529 non-null object
location_latitude              4529 non-null float64
locat

In [7]:
#Reading Column Names
zomato.columns

Index(['address', 'name', 'online_order', 'book_table', 'rate', 'votes',
       'location', 'rest_type', 'cuisines', 'approx_cost(for two people)',
       'reviews_list', 'menu_item', 'listed_in(type)', 'listed_in(city)',
       'liked_food_from_review', 'menus_combined', 'location_latitude',
       'location_longitude'],
      dtype='object')

In [8]:
#Changing the column names
zomato = zomato.rename(columns={'approx_cost(for two people)':'cost','listed_in(type)':'type',
                                  'listed_in(city)':'city'})
zomato.columns

Index(['address', 'name', 'online_order', 'book_table', 'rate', 'votes',
       'location', 'rest_type', 'cuisines', 'cost', 'reviews_list',
       'menu_item', 'type', 'city', 'liked_food_from_review', 'menus_combined',
       'location_latitude', 'location_longitude'],
      dtype='object')

In [9]:
#Some Transformations
zomato['cost'] = zomato['cost'].astype(str) #Changing the cost to string
zomato['cost'] = zomato['cost'].apply(lambda x: x.replace(',','.')) #Using lambda function to replace ',' from cost
zomato['cost'] = zomato['cost'].astype(float) # Changing the cost to Float
zomato.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4529 entries, 0 to 9252
Data columns (total 18 columns):
address                   4529 non-null object
name                      4529 non-null object
online_order              4529 non-null object
book_table                4529 non-null object
rate                      4529 non-null float64
votes                     4529 non-null float64
location                  4529 non-null object
rest_type                 4529 non-null object
cuisines                  4529 non-null object
cost                      4529 non-null float64
reviews_list              4529 non-null object
menu_item                 4529 non-null object
type                      4529 non-null object
city                      4529 non-null object
liked_food_from_review    4529 non-null object
menus_combined            4529 non-null object
location_latitude         4529 non-null float64
location_longitude        4529 non-null float64
dtypes: float64(5), object(13)
memory usag

In [10]:
#Reading Rate of dataset
zomato['rate'].unique()

array([4.1, 3.8, 3.7, 4.6, 4. , 4.2, 3.9, 3. , 3.6, 2.8, 4.4, 3.1, 4.3,
       2.6, 3.3, 3.5, 3.2, 4.5, 2.5, 2.9, 3.4, 2.7, 4.7, 2.4, 2.2, 2.3,
       4.8, 4.9, 2.1, 2. , 1.8])

In [12]:
# Adjust the column names
zomato.name = zomato.name.apply(lambda x:x.title())
zomato.online_order.replace(('Yes','No'),(True, False),inplace=True)
zomato.book_table.replace(('Yes','No'),(True, False),inplace=True)
zomato.cost.unique()

array([ 800.,  300.,  600.,  700.,  550.,  500.,  450.,  650.,  400.,
        750.,  200.,  850., 1200.,  150.,  350.,  250., 1500., 1300.,
       1000.,  100.,  900., 1100., 1600.,  950.,  230., 1700., 1350.,
       2200., 1400., 2000., 1800., 1900.,  180.,  330., 2500., 2100.,
       3000., 2800., 3400.,   40., 1250., 3500., 4000., 2400., 1450.,
       3200., 6000., 1050., 4100., 2300.,  120., 2600., 5000., 3700.,
       1650., 2700., 4500.])

In [13]:
zomato.head()

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,liked_food_from_review,menus_combined,location_latitude,location_longitude
0,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775.0,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari,"['fried rice', 'soya chaap', 'kulcha', 'rice',...","['pasta,', 'lunch', 'buffet,', 'masala', 'papa...",12.915382,77.573638
1,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787.0,Banashankari,Casual Dining,"Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari,"['chicken', 'fried rice', 'chicken biryan', 'c...","['momos,', 'lunch', 'buffet,', 'chocolate', 'n...",12.915382,77.573638
2,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918.0,Banashankari,"Cafe, Casual Dining","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari,"['veggies', 'egg', 'pasta', 'cake', 'chocolate...","['churros,', 'cannelloni,', 'minestrone', 'sou...",12.915382,77.573638
3,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88.0,Banashankari,Quick Bites,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari,"['pulka', 'apple', 'rice with sambar', 'dosa',...","['masala', 'dosa', 'pulka', 'apple', 'rice wit...",12.915382,77.573638
4,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166.0,Basavanagudi,Casual Dining,"North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari,"['roti', 'noodles', 'kulcha', 'pav bhaji', 'pa...","['panipuri,', 'gol', 'gappe', 'roti', 'noodles...",12.941726,77.575502


In [14]:
zomato['city'].unique()

array(['Banashankari', 'Bannerghatta Road', 'Basavanagudi', 'Bellandur',
       'Brigade Road', 'Brookefield', 'BTM', 'Church Street',
       'Electronic City', 'Frazer Town', 'HSR', 'Indiranagar',
       'Jayanagar', 'JP Nagar', 'Kalyan Nagar', 'Kammanahalli',
       'Koramangala 4th Block', 'Koramangala 5th Block',
       'Koramangala 6th Block', 'Koramangala 7th Block', 'Lavelle Road',
       'Malleshwaram', 'Marathahalli', 'MG Road', 'New BEL Road',
       'Old Airport Road', 'Rajajinagar', 'Residency Road',
       'Sarjapur Road', 'Whitefield'], dtype=object)

In [15]:
zomato.head()


Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,liked_food_from_review,menus_combined,location_latitude,location_longitude
0,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775.0,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari,"['fried rice', 'soya chaap', 'kulcha', 'rice',...","['pasta,', 'lunch', 'buffet,', 'masala', 'papa...",12.915382,77.573638
1,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787.0,Banashankari,Casual Dining,"Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari,"['chicken', 'fried rice', 'chicken biryan', 'c...","['momos,', 'lunch', 'buffet,', 'chocolate', 'n...",12.915382,77.573638
2,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918.0,Banashankari,"Cafe, Casual Dining","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari,"['veggies', 'egg', 'pasta', 'cake', 'chocolate...","['churros,', 'cannelloni,', 'minestrone', 'sou...",12.915382,77.573638
3,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88.0,Banashankari,Quick Bites,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari,"['pulka', 'apple', 'rice with sambar', 'dosa',...","['masala', 'dosa', 'pulka', 'apple', 'rice wit...",12.915382,77.573638
4,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166.0,Basavanagudi,Casual Dining,"North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari,"['roti', 'noodles', 'kulcha', 'pav bhaji', 'pa...","['panipuri,', 'gol', 'gappe', 'roti', 'noodles...",12.941726,77.575502


In [16]:
## Checking Null values
zomato.isnull().sum()

address                   0
name                      0
online_order              0
book_table                0
rate                      0
votes                     0
location                  0
rest_type                 0
cuisines                  0
cost                      0
reviews_list              0
menu_item                 0
type                      0
city                      0
liked_food_from_review    0
menus_combined            0
location_latitude         0
location_longitude        0
dtype: int64

In [17]:
## Computing Mean Rating
restaurants = list(zomato['name'].unique())
zomato['Mean Rating'] = 0

for i in range(len(restaurants)):
    zomato['Mean Rating'][zomato['name'] == restaurants[i]] = zomato['rate'][zomato['name'] == restaurants[i]].mean()

In [18]:

zomato.head()

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,liked_food_from_review,menus_combined,location_latitude,location_longitude,Mean Rating
0,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775.0,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari,"['fried rice', 'soya chaap', 'kulcha', 'rice',...","['pasta,', 'lunch', 'buffet,', 'masala', 'papa...",12.915382,77.573638,4.15
1,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787.0,Banashankari,Casual Dining,"Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari,"['chicken', 'fried rice', 'chicken biryan', 'c...","['momos,', 'lunch', 'buffet,', 'chocolate', 'n...",12.915382,77.573638,4.1
2,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918.0,Banashankari,"Cafe, Casual Dining","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari,"['veggies', 'egg', 'pasta', 'cake', 'chocolate...","['churros,', 'cannelloni,', 'minestrone', 'sou...",12.915382,77.573638,3.8
3,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88.0,Banashankari,Quick Bites,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari,"['pulka', 'apple', 'rice with sambar', 'dosa',...","['masala', 'dosa', 'pulka', 'apple', 'rice wit...",12.915382,77.573638,3.7
4,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166.0,Basavanagudi,Casual Dining,"North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari,"['roti', 'noodles', 'kulcha', 'pav bhaji', 'pa...","['panipuri,', 'gol', 'gappe', 'roti', 'noodles...",12.941726,77.575502,3.8


In [19]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range = (1,5))

zomato[['Mean Rating']] = scaler.fit_transform(zomato[['Mean Rating']]).round(2)

zomato.sample(3)

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,liked_food_from_review,menus_combined,location_latitude,location_longitude,Mean Rating
428,"21, 1st Floor, Next to Oracle, Kalyani Magnum ...",Tandoor And Spice,True,False,4.0,181.0,JP Nagar,Casual Dining,"North Indian, Chinese, Biryani, Hyderabadi",550.0,"[('Rated 5.0', 'RATED\n A restaurant nearby t...",[],Delivery,Bannerghatta Road,"['butter chicken', 'butter', 'chilly prawn', '...","['egg', 'biryani,', 'tandoori', 'chicken,', 'c...",12.907251,77.578271,3.84
1575,"Lemon Tree Hotel, 23, EPIP Zone, Near I Gate C...",Citrus Cafe - Lemon Tree Hotel,False,False,3.8,118.0,Whitefield,Casual Dining,"North Indian, Continental, South Indian, Asian",1900.0,"[('Rated 3.0', 'RATED\n Went here on an eveni...",[],Buffet,Brookefield,"['biscuits', 'biscuit', 'filter coffee', 'coff...","['breakfast', 'buffet', 'biscuits', 'biscuit',...",12.969637,77.749745,3.65
2909,"1702, 19th Main Road, Sector 2, HSR Layout, HS...",Feaster,True,False,3.8,88.0,HSR,Casual Dining,"Continental, Asian, Chinese, Italian, North In...",600.0,"[('Rated 1.0', ""RATED\n I ordered beef fry an...",[],Delivery,HSR,[],"['liver', 'fry,', 'burgers,', 'pizza']",12.914453,77.642694,3.58


In [20]:
zomato.head()

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,liked_food_from_review,menus_combined,location_latitude,location_longitude,Mean Rating
0,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775.0,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari,"['fried rice', 'soya chaap', 'kulcha', 'rice',...","['pasta,', 'lunch', 'buffet,', 'masala', 'papa...",12.915382,77.573638,4.03
1,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787.0,Banashankari,Casual Dining,"Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari,"['chicken', 'fried rice', 'chicken biryan', 'c...","['momos,', 'lunch', 'buffet,', 'chocolate', 'n...",12.915382,77.573638,3.97
2,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918.0,Banashankari,"Cafe, Casual Dining","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari,"['veggies', 'egg', 'pasta', 'cake', 'chocolate...","['churros,', 'cannelloni,', 'minestrone', 'sou...",12.915382,77.573638,3.58
3,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88.0,Banashankari,Quick Bites,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari,"['pulka', 'apple', 'rice with sambar', 'dosa',...","['masala', 'dosa', 'pulka', 'apple', 'rice wit...",12.915382,77.573638,3.45
4,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166.0,Basavanagudi,Casual Dining,"North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari,"['roti', 'noodles', 'kulcha', 'pav bhaji', 'pa...","['panipuri,', 'gol', 'gappe', 'roti', 'noodles...",12.941726,77.575502,3.58


## Text Preprocessing
Some of the common text preprocessing / cleaning steps are:

Lower casing,
Removal of Punctuations,
Removal of Stopwords,
Removal of URLs,
Spelling correction,

In [21]:
# 5 examples of these columns before text processing:
zomato[['reviews_list', 'cuisines']].sample(5)

Unnamed: 0,reviews_list,cuisines
2097,"[('Rated 4.0', ""RATED\n Beer Adda is a decent...","North Indian, Chinese, Pizza"
2500,"[('Rated 4.0', ""RATED\n I visited this place ...",Finger Food
3212,"[('Rated 4.0', 'RATED\n Good food. The pepper...",Kerala
2630,"[('Rated 3.0', ""RATED\n It's a good restauran...","Arabian, Kerala, Biryani, South Indian, Chines..."
1731,"[('Rated 4.0', 'RATED\n Chai point is a great...","Tea, Beverages, Fast Food"


In [22]:
## Lower Casing
zomato["reviews_list"] = zomato["reviews_list"].str.lower()
zomato[['reviews_list', 'cuisines']].sample(5)

Unnamed: 0,reviews_list,cuisines
2199,"[('rated 4.0', 'rated\n a nice place to have ...","North Indian, Chinese"
2114,"[('rated 3.0', ""rated\n ordered garlic bread ...","North Indian, Chinese, BBQ"
839,"[('rated 3.0', ""rated\n i've been here numero...","South Indian, North Indian, Chinese, Street Food"
2229,"[('rated 1.0', 'rated\n i have been waiting f...","North Indian, Chinese"
4513,"[('rated 3.0', ""rated\n this place is known t...","North Indian, Kerala"


In [24]:
## Removal of Puctuations
import string
PUNCT_TO_REMOVE = string.punctuation
def remove_punctuation(text):
    """custom function to remove the punctuation"""
    return text.translate(str.maketrans('', '', PUNCT_TO_REMOVE))

zomato["reviews_list"] = zomato["reviews_list"].apply(lambda text: remove_punctuation(text))
zomato[['reviews_list', 'cuisines']].sample(5)

Unnamed: 0,reviews_list,cuisines
2361,rated 40 ratedn one of the favourite restaura...,"Seafood, Biryani, South Indian, Chettinad, Chi..."
2340,rated 50 ratedn the food in the restaurant tr...,"North Indian, Chinese, Biryani"
708,rated 50 ratedn amazing snacks on the go a mu...,"Beverages, Desserts"
3919,rated 30 ratedn located just opposite 1522it ...,"Finger Food, Continental, Chinese"
2170,rated 40 ratedn chicken lollipopnchicken loll...,"Biryani, Seafood, Andhra, North Indian"


Removal of Stopwords
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))
def remove_stopwords(text):
    """custom function to remove the stopwords"""
    return " ".join([word for word in str(text).split() if word not in STOPWORDS])

zomato["reviews_list"] = zomato["reviews_list"].apply(lambda text: remove_stopwords(text))

In [26]:
## Removal of URLS
def remove_urls(text):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    return url_pattern.sub(r'', text)

zomato["reviews_list"] = zomato["reviews_list"].apply(lambda text: remove_urls(text))

In [27]:
zomato[['reviews_list', 'cuisines']].sample(5)


Unnamed: 0,reviews_list,cuisines
2662,rated 40 ratedn just tried theri bengali pula...,Bengali
1943,rated 50 ratedn have eaten here multiple time...,"Italian, Salad"
2588,rated 40 ratedn my favourite go to place when...,"Beverages, Sandwich, Fast Food"
256,rated 50 ratedn if veg food tastes so good on...,"North Indian, South Indian"
337,rated 30 ratedn food quality is okaynrui mach...,Bengali


In [28]:
# RESTAURANT NAMES:
restaurant_names = list(zomato['name'].unique())
restaurant_names

['Jalsa',
 'Spice Elephant',
 'San Churro Cafe',
 'Addhuri Udupi Bhojana',
 'Grand Village',
 'Timepass Dinner',
 'Onesta',
 'Penthouse Cafe',
 'Smacznego',
 'Cafe Down The Alley',
 'Cafe Shuffle',
 'The Coffee Shack',
 'Caf-Eleven',
 'Cafe Vivacity',
 'Catch-Up-Ino',
 "Kirthi'S Biryani",
 'T3H Cafe',
 'The Vintage Cafe',
 'Woodee Pizza',
 'My Tea House',
 "Srinathji'S Cafe",
 'Redberrys',
 'Foodiction',
 'Ovenstory Pizza',
 'Faasos',
 'Behrouz Biryani',
 'Szechuan Dragon',
 'Empire Restaurant',
 'Chaatimes',
 "Mcdonald'S",
 "Domino'S Pizza",
 'Hotboxit',
 'Kitchen Garden',
 'Recipe',
 'Beijing Bites',
 'Tasty Bytes',
 'Corner House Ice Cream',
 'Biryanis And More',
 'Roving Feast',
 'Freshmenu',
 'Wamama',
 'Peppy Peppers',
 'Goa 0 Km',
 '1947',
 'Kabab Magic',
 'Gustoes Beer House',
 'The Biryani Cafe',
 'Rolls On Wheels',
 'Sri Guru Kottureshwara Davangere Benne Dosa',
 'Upahar Sagar',
 'Frozen Bottle',
 'Meghana Foods',
 'Nandhini Deluxe',
 "Vi Ra'S Bar And Restaurant",
 'Chatar Pa

In [29]:
def get_top_words(column, top_nu_of_words, nu_of_word):
    
    vec = CountVectorizer(ngram_range= nu_of_word, stop_words='english')
    
    bag_of_words = vec.fit_transform(column)
    
    sum_words = bag_of_words.sum(axis=0)
    
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    
    return words_freq[:top_nu_of_words]

In [30]:
zomato.head()

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,liked_food_from_review,menus_combined,location_latitude,location_longitude,Mean Rating
0,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775.0,Banashankari,Casual Dining,"North Indian, Mughlai, Chinese",800.0,rated 40 ratedn a beautiful place to dine int...,[],Buffet,Banashankari,"['fried rice', 'soya chaap', 'kulcha', 'rice',...","['pasta,', 'lunch', 'buffet,', 'masala', 'papa...",12.915382,77.573638,4.03
1,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787.0,Banashankari,Casual Dining,"Chinese, North Indian, Thai",800.0,rated 40 ratedn had been here for dinner with...,[],Buffet,Banashankari,"['chicken', 'fried rice', 'chicken biryan', 'c...","['momos,', 'lunch', 'buffet,', 'chocolate', 'n...",12.915382,77.573638,3.97
2,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918.0,Banashankari,"Cafe, Casual Dining","Cafe, Mexican, Italian",800.0,rated 30 ratedn ambience is not that good eno...,[],Buffet,Banashankari,"['veggies', 'egg', 'pasta', 'cake', 'chocolate...","['churros,', 'cannelloni,', 'minestrone', 'sou...",12.915382,77.573638,3.58
3,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88.0,Banashankari,Quick Bites,"South Indian, North Indian",300.0,rated 40 ratedn great food and proper karnata...,[],Buffet,Banashankari,"['pulka', 'apple', 'rice with sambar', 'dosa',...","['masala', 'dosa', 'pulka', 'apple', 'rice wit...",12.915382,77.573638,3.45
4,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166.0,Basavanagudi,Casual Dining,"North Indian, Rajasthani",600.0,rated 40 ratedn very good restaurant in neigh...,[],Buffet,Banashankari,"['roti', 'noodles', 'kulcha', 'pav bhaji', 'pa...","['panipuri,', 'gol', 'gappe', 'roti', 'noodles...",12.941726,77.575502,3.58


In [31]:
zomato.sample(5)

Unnamed: 0,address,name,online_order,book_table,rate,votes,location,rest_type,cuisines,cost,reviews_list,menu_item,type,city,liked_food_from_review,menus_combined,location_latitude,location_longitude,Mean Rating
2286,"124, Near Jyothi Nivas College, Koramangala 5t...",Peace Restaurant,True,False,3.8,658.0,Koramangala 5th Block,Casual Dining,"Chinese, Tibetan",500.0,rated 30 ratedn went here after a long day of...,[],Delivery,BTM,"['chicken', 'noodles', 'schezwan noodles', 'ri...","['momos,', 'thukpa,', 'dumplings,', 'crispy', ...",12.934011,77.62223,2.63
3573,"80 Feet Road, 1st Stage, Opposite BDA Complex,...",Angaar Indian Grill & Restaurant,False,False,3.9,121.0,HBR Layout,Casual Dining,"Arabian, Mughlai, North Indian, Chinese, Rolls",600.0,rated 40 ratedn when mutton is served in the ...,[],Delivery,Kalyan Nagar,['rice'],"['shawarma,', 'mutton', 'biryani,', 'angara', ...",13.03587,77.63236,3.71
799,"89/1, Service Road, Marathahalli, Bangalore",Moriz Restaurant,True,False,2.9,666.0,Marathahalli,Casual Dining,"Arabian, North Indian, Seafood, Chinese, Biryani",800.0,rated 30 ratedn i am a fan of this joint they...,[],Delivery,Bellandur,"['mushroom', 'gobi manchurian', 'grape', 'veg ...","['shawarma,', 'chicken', 'masala,', 'chicken',...",12.955257,77.698416,2.37
662,"68, 10th Main Road, 36th Cross, 5th Block, Jay...",Pop Hop,True,False,4.1,204.0,Jayanagar,Dessert Parlor,"Ice Cream, Desserts",300.0,rated 50 ratedn great place for some yummy ic...,[],Delivery,Basavanagudi,"['hazelnut', 'mocha ice cream', 'badam', 'moji...","['choco', 'hazelnut,', 'belgian', 'chocolate,'...",12.929273,77.582423,3.9
3932,"2/10, 80 Feet Road, RMV 2nd Stage, Poojary Lay...",Cafe Rossini,True,False,3.9,431.0,New BEL Road,Casual Dining,"Continental, Italian, Chinese",700.0,rated 50 ratedn i ordered the chinese chop su...,"['Cream of Chicken Soup', 'Veg Italiano Pasta'...",Delivery,Malleshwaram,"['latte', 'rice', 'dates', 'pasta', 'veg start...","['pasta,', 'sandwiches,', 'chilli', 'paneer,',...",13.028825,77.571148,3.71


In [32]:
zomato.shape


(4529, 19)

In [33]:
zomato.columns


Index(['address', 'name', 'online_order', 'book_table', 'rate', 'votes',
       'location', 'rest_type', 'cuisines', 'cost', 'reviews_list',
       'menu_item', 'type', 'city', 'liked_food_from_review', 'menus_combined',
       'location_latitude', 'location_longitude', 'Mean Rating'],
      dtype='object')

In [34]:
zomato=zomato.drop(['address','rest_type', 'type', 'menu_item', 'votes'],axis=1)


In [35]:
import pandas

# Randomly sample 60% of your dataframe
df_percent = zomato.sample(frac=0.5)

In [36]:
df_percent.shape


(2264, 14)


## Term Frequency-Inverse Document Frequency
Term Frequency-Inverse Document Frequency (TF-IDF) vectors for each document. This will give you a matrix where each column represents a word in the overview vocabulary (all the words that appear in at least one document) and each column represents a restaurant, as before.

TF-IDF is the statistical method of evaluating the significance of a word in a given document.

TF — Term frequency(tf) refers to how many times a given term appears in a document.

IDF — Inverse document frequency(idf) measures the weight of the word in the document, i.e if the word is common or rare in the entire document. The TF-IDF intuition follows that the terms that appear frequently in a document are less important than terms that rarely appear. Fortunately, scikit-learn gives you a built-in TfIdfVectorizer class that produces the TF-IDF matrix quite easily.

In [37]:
df_percent.set_index('name', inplace=True)

In [38]:
indices = pd.Series(df_percent.index)


In [39]:
# Creating tf-idf matrix
tfidf = TfidfVectorizer(analyzer='word', ngram_range=(1, 2), min_df=0, stop_words='english')
tfidf_matrix = tfidf.fit_transform(df_percent['reviews_list'])


In [40]:
cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix)


In [41]:
def recommend(name, cosine_similarities = cosine_similarities):
    
    # Create a list to put top 10 restaurants
    recommend_restaurant = []
    
    # Find the index of the hotel entered
    idx = indices[indices == name].index[0]
    
    # Find the restaurants with a similar cosine-sim value and order them from bigges number
    score_series = pd.Series(cosine_similarities[idx]).sort_values(ascending=False)
    
    # Extract top 30 restaurant indexes with a similar cosine-sim value
    top30_indexes = list(score_series.iloc[0:31].index)
    
    # Names of the top 30 restaurants
    for each in top30_indexes:
        recommend_restaurant.append(list(df_percent.index)[each])
    
    # Creating the new data set to show similar restaurants
    df_new = pd.DataFrame(columns=['cuisines', 'Mean Rating', 'cost'])
    
    # Create the top 30 similar restaurants with some of their columns
    for each in recommend_restaurant:
        df_new = df_new.append(pd.DataFrame(df_percent[['cuisines','Mean Rating', 'cost']][df_percent.index == each].sample()))
    
    # Drop the same named restaurants and sort only the top 10 by the highest rating
    df_new = df_new.drop_duplicates(subset=['cuisines','Mean Rating', 'cost'], keep=False)
    df_new = df_new.sort_values(by='Mean Rating', ascending=False).head(10)
    
    print('TOP %s RESTAURANTS LIKE %s WITH SIMILAR REVIEWS: ' % (str(len(df_new)), name))
    
    return df_new

In [42]:
# HERE IS A RANDOM RESTAURANT. LET'S SEE THE DETAILS ABOUT THIS RESTAURANT:
df_percent[df_percent.index == 'Pai Vihar'].head()

Unnamed: 0_level_0,online_order,book_table,rate,location,cuisines,cost,reviews_list,city,liked_food_from_review,menus_combined,location_latitude,location_longitude,Mean Rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Pai Vihar,True,False,2.8,Vasanth Nagar,"South Indian, Street Food, Chinese, Fast Food",400.0,rated 20 ratedn a nice place in vasanthnagar ...,Brigade Road,[],"['masala', 'dosa,', 'coffee']",12.988721,77.585169,2.55
Pai Vihar,False,False,3.2,City Market,"South Indian, Street Food, Chinese, Fast Food",400.0,rated 20 ratedn food was dry and bland i dont...,Brigade Road,[],['vada'],12.965718,77.576271,2.55


In [43]:
recommend('Pai Vihar')

TOP 10 RESTAURANTS LIKE Pai Vihar WITH SIMILAR REVIEWS: 


Unnamed: 0,cuisines,Mean Rating,cost
Brew And Barbeque - A Microbrewery Pub,"Continental, North Indian, BBQ, Steak",4.74,1400.0
Dyu Art Cafe,"Cafe, Italian, Fast Food",4.48,800.0
The Reservoire,"Continental, North Indian, Chinese, American, ...",4.48,1300.0
Foxtrot,"North Indian, Chinese, Continental, Momos",4.35,1200.0
Hoot,"Continental, Italian, North Indian",4.1,1400.0
Bonsouth,"Chettinad, Andhra, Kerala",4.1,1300.0
Mavalli Tiffin Room (Mtr),"South Indian, Beverages",4.05,300.0
Samosa Party,"Street Food, Fast Food",3.97,150.0
Krishna Kuteera,"South Indian, North Indian, Chinese",3.84,400.0
Udupi Aatithya,"South Indian, North Indian, Chinese",3.84,300.0


References

Recommender Systems in Python 101
How to build a Restaurant Recommendation Engine
Getting started with Text Preprocessing

End of the Notebook