# **Project Name**    - Zomoto Restaurant Clustering and Sentiment Analysis



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual


# **Project Summary -**
The "Zomoto Restaurant Clustering and Sentiment Analysis" project aims to analyze customer feedback for restaurants on the popular food delivery platform, Zomato. The main objectives of the project are two-fold:

Restaurant Clustering: In this aspect, the project will use unsupervised machine learning techniques such as K-means clustering to group similar restaurants based on various features such as cuisine type, location, rating, and price range. This will help in better understanding the customer preferences and provide insights to restaurants for improving their offerings.

Sentiment Analysis: The project will also perform sentiment analysis on customer reviews to determine their overall sentiment towards the restaurant. This information can help restaurants understand the customer's likes and dislikes and take necessary measures to improve customer satisfaction.

The project will gather data from the Zomato API and pre-process it to remove irrelevant information. The processed data will then be used to perform clustering and sentiment analysis. The results of the analysis will be visualized using appropriate plots and charts for better understanding.

This project will be a valuable resource for restaurants on Zomato as it will help them understand their customers better and make data-driven decisions to improve their offerings. It will also provide insights to customers by highlighting the strengths and weaknesses of restaurants in a particular location or cuisine type.

In conclusion, the "Zomoto Restaurant Clustering and Sentiment Analysis" project is an innovative and practical solution to gather valuable insights from customer feedback on Zomato. The project will help restaurants to improve their offerings and provide a better customer experience, and also help customers in making informed decisions while choosing a restaurant.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**

Zomato is an Indian restaurant aggregator and food delivery start-up founded by Deepinder Goyal and Pankaj Chaddah in 2008. Zomato provides information, menus and user-reviews of restaurants, and also has food delivery options from partner restaurants in select cities.

India is quite famous for its diverse multi cuisine available in a large number of restaurants and hotel resorts, which is reminiscent of unity in diversity. Restaurant business in India is always evolving. More Indians are warming up to the idea of eating restaurant food whether by dining outside or getting food delivered. The growing number of restaurants in every state of India has been a motivation to inspect the data to get some insights, interesting facts and figures about the Indian food industry in each city. So, this project focuses on analysing the Zomato restaurant data for each city in India.

The Project focuses on Customers and Company, you have to analyze the sentiments of the reviews given by the customer in the data and made some useful conclusion in the form of Visualizations. Also, cluster the zomato restaurants into different segments. The data is vizualized as it becomes easy to analyse data at instant. The Analysis also solve some of the business cases that can directly help the customers finding the Best restaurant in their locality and for the company to grow up and work on the fields they are currently lagging in.

This could help in clustering the restaurants into segments. Also the data has valuable information around cuisine and costing which can be used in cost vs. benefit analysis

Data could be used for sentiment analysis. Also the metadata of reviewers can be used for identifying the critics in the industry.


# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# importing libraries and modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from wordcloud import WordCloud

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset Loading

In [None]:
# Load Dataset
#importing datasets
meta_df_main=pd.read_csv("/content/drive/MyDrive/Almabetter/module4/Zomato_project/Zomato Restaurant names and Metadata.csv")
review_df=pd.read_csv("/content/drive/MyDrive/Almabetter/module4/Zomato_project/Zomato Restaurant reviews.csv")

In [None]:
meta_df = meta_df_main.copy()

### Dataset First View

In [None]:
# Dataset First Look
meta_df.head()

In [None]:
meta_df.tail()

In [None]:
meta_df.info()

In [None]:
meta_df.isnull().sum()

In [None]:
meta_df[meta_df['Collections'].isnull()].head()

In [None]:
meta_df[meta_df['Timings'].isnull()]

In [None]:
meta_df.describe()

In [None]:
# Checking duplicate rows in dataset
meta_df.duplicated(keep='last').sum()

In [None]:
# Checking duplicate restaurant name
meta_df['Name'].duplicated().sum()

In [None]:
# Changing cost function's data type
meta_df['Cost'] =  meta_df['Cost'].str.replace(",","").astype('int64')

EDA on MetaData Dataset

In [None]:
meta_df.head()

In [None]:
meta_df.shape

In [None]:
meta_df['Name'].nunique()

In [None]:
top_10_costly_rest=meta_df[['Name','Cost']].groupby('Name',as_index=False).sum().sort_values(by='Cost',ascending=False).head(10)

In [None]:
# Top 10 Expensive Restaurants
plt.figure(figsize=(15,6))
x = top_10_costly_rest['Cost']
y = top_10_costly_rest['Name']
plt.title("Top 10 Expensive Restaurant",fontsize=20,weight='bold',color=sns.cubehelix_palette(8, start=.5, rot=-.75)[-3])
plt.ylabel("Name",weight='bold',fontsize=15)
plt.xlabel("Cost",weight='bold',fontsize=15)
plt.xticks(rotation=90)
sns.barplot(x=x, y=y,palette='plasma')
plt.show()

In [None]:
# Affordable price restaurants,here im considering the lowest price retaurants as affordable to all group of customers
plt.figure(figsize=(15,6))
top_10_affor_rest=meta_df[['Name','Cost']].groupby('Name',as_index=False).sum().sort_values(by='Cost',ascending=False).tail(10)
x = top_10_affor_rest['Cost']
y = top_10_affor_rest['Name']
plt.title("Top 10 Affordable Restaurant",fontsize=20, weight='bold',color=sns.cubehelix_palette(8, start=.5, rot=-.75)[-3])
plt.ylabel("Name",weight='bold',fontsize=15)
plt.xlabel("Cost",weight='bold',fontsize=15)
plt.xticks(rotation=90)
sns.barplot(x=x, y=y,palette='rocket')
plt.show()


In [None]:
#Creating word cloud for expensive restaurants
plt.figure(figsize=(15,8))
text = " ".join(name for name in meta_df.sort_values('Cost',ascending=False).Name[:30])


# Creating word_cloud with text as argument in .generate() method

word_cloud = WordCloud(width = 1400, height = 1400,collocations =False , background_color = 'black').generate(text)

# Display the generated Word Cloud

plt.imshow(word_cloud, interpolation='bilinear')

plt.axis("off")

In [None]:
#Creating word cloud for cheap restaurants

plt.figure(figsize=(15,8))
text = " ".join(name for name in meta_df.sort_values('Cost',ascending=False).Name[-30:])


# Creating word_cloud with text as argument in .generate() method

word_cloud = WordCloud(width = 1400, height = 1400,collocations = False, background_color = 'black').generate(text)

# Display the generated Word Cloud

plt.imshow(word_cloud, interpolation='bilinear')

plt.axis("off")

In [None]:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords

In [None]:
# extracting the stopwords from nltk library
sw = stopwords.words('english')

In [None]:
def stopwords(text):
    '''a function for removing the stopword'''
    # removing the stop words and lowercasing the selected words
    text = [word.lower() for word in text.split() if word.lower() not in sw]
    # joining the list of words with space separator
    return " ".join(text)

In [None]:
# Removing stopwords from Cuisines
meta_df['Cuisines'] = meta_df['Cuisines'].apply(lambda text: stopwords(text))
meta_df['Cuisines'].head()

In [None]:
def remove_punctuation(text):
    '''a function for removing punctuation'''
    import string
    # replacing the punctuations with no space, 
    # which in effect deletes the punctuation marks 
    translator = str.maketrans('', '', string.punctuation)
    # return the text stripped of punctuation marks
    return text.translate(translator)

In [None]:
meta_df['Cuisines'] = meta_df['Cuisines'].apply(lambda x: remove_punctuation(x))
meta_df['Cuisines'].head()

In [None]:
import re

In [None]:
def cleaning_repeating_char(text):
    return re.sub(r'(.)1+', r'1', text)

In [None]:
meta_df['Cuisines'] = meta_df['Cuisines'].apply(lambda x: cleaning_repeating_char(x))
meta_df['Cuisines'].head()

In [None]:
def cleaning_numbers(data):
    return re.sub('[0-9]+', '', data)

In [None]:
from collections import Counter 
text = ' '.join(meta_df['Cuisines'])
words = text.split()

two_words = {' '.join(words):n for words,n in Counter(zip(words, words[1:])).items() if not  words[0][-1]==(',')}

In [None]:
meta_df['Cuisines'] = meta_df['Cuisines'].apply(lambda x: cleaning_numbers(x))
meta_df['Cuisines'].head()

In [None]:
word_freq = pd.DataFrame(two_words.items(), columns=['Cuisine_Words', 'Frequency'])
word_freq = word_freq.sort_values(by = "Frequency", ascending = False)
word_freq_20 = word_freq[:20]

In [None]:
# most cusines served in restaurant
plt.figure(figsize=(15,6))
y = word_freq_20['Cuisine_Words']
x = word_freq_20['Frequency']
plt.title("most cusines served in restaurant",fontsize=20, weight='bold',color=sns.cubehelix_palette(8, start=.5, rot=-.75)[-3])
plt.ylabel("Cuisine Words",weight='bold',fontsize=15)
plt.xlabel("Frequency",weight='bold',fontsize=15)
plt.xticks(rotation=90)
sns.barplot(x=x, y=y,palette="plasma")
plt.show()

In [None]:
#Wordcloud for Cuisine
plt.figure(figsize=(15,8))
text = " ".join(name for name in word_freq.Cuisine_Words )


# Creating word_cloud with text as argument in .generate() method

word_cloud = WordCloud(width = 1400, height = 1400,collocations = False, background_color = 'black').generate(text)

# Display the generated Word Cloud

plt.imshow(word_cloud, interpolation='bilinear')

plt.axis("off")