                                    Zomato Restaurant
Project Description:-
Zomato Data Analysis is one of the most useful analysis for foodies who want to taste the best
cuisines of every part of the world which lies in their budget. This analysis is also for those who
want to find the value for money restaurants in various parts of the country for the cuisines.
Additionally, this analysis caters the needs of people who are striving to get the best cuisine of
the country and which locality of that country serves that cuisines with maximum number of
restaurants.

Data Storage:
This problem statement contains two datasets- Zomato.csv and country_code.csv.
Country_code.csv contains two variables:
 Country code
 Country name

The collected data has been stored in the Comma Separated Value file Zomato.csv. Each
restaurant in the dataset is uniquely identified by its Restaurant Id. Every Restaurant contains the following variables:
• Restaurant Id: Unique id of every restaurant across various cities of the world
• Restaurant Name: Name of the restaurant
• Country Code: Country in which restaurant is located
• City: City in which restaurant is located
• Address: Address of the restaurant
• Locality: Location in the city
• Locality Verbose: Detailed description of the locality
• Longitude: Longitude coordinate of the restaurant&#39;s location
• Latitude: Latitude coordinate of the restaurant&#39;s location
• Cuisines: Cuisines offered by the restaurant
• Average Cost for two: Cost for two people in different currencies   
• Currency: Currency of the country
• Has Table booking: yes/no
• Has Online delivery: yes/ no
• Is delivering: yes/ no
• Switch to order menu: yes/no
• Price range: range of price of food
• Aggregate Rating: Average rating out of 5
• Rating color: depending upon the average rating color
• Rating text: text on the basis of rating of rating
• Votes: Number of ratings casted by people

Problem statement : In this dataset predict 2 things –
1) Average Cost for two
2) Price range


Hint : Use pandas methods to combine all the datasets and then start working on this project.

Dataset Link-  
https://github.com/FlipRoboTechnologies/ML_-Datasets/blob/main/Z_Restaurant/Country-Code.xlsx
https://raw.githubusercontent.com/FlipRoboTechnologies/ML_-Datasets/main/Z_Restaurant/zomato.csv

In [5]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_squared_error, accuracy_score

In [None]:
zomato_url = "https://raw.githubusercontent.com/FlipRoboTechnologies/ML_-Datasets/main/Z_Restaurant/zomato.csv"
country_code_url = "https://github.com/FlipRoboTechnologies/ML_-Datasets/blob/main/Z_Restaurant/Country-Code.xlsx?raw=true"

In [None]:
zomato_df = pd.read_csv(zomato_url)
country_df = pd.read_excel(country_code_url)

In [None]:
print("Zomato Dataset:")
print(zomato_df.info())
print(zomato_df.head())

In [None]:
print("\nCountry Code Dataset:")
print(country_df.info())
print(country_df.head())

In [None]:
merged_df = pd.merge(zomato_df, country_df, how='left', on='Country Code')

print("\nMerged Dataset:")
print(merged_df.info())
print(merged_df.head())

print("\nMissing Values:")
print(merged_df.isnull().sum())

In [None]:
merged_df.fillna('', inplace=True)

print("\nMissing Values after filling")
print(merged_df.isnull().sum())

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(merged_df['Average Cost for two'], bins=30, kde=True)
plt.title('Distribution of Average Cost for Two')
plt.xlabel('Average Cost for Two')
plt.ylabel('Frequency')
plt.show()

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(x='Price range', data=merged_df)
plt.title('Distribution of Price Range')
plt.xlabel('Price Range')
plt.ylabel('Count')
plt.show()

In [None]:
plt.figure(figsize=(12, 8))
sns.heatmap(merged_df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

In [None]:
plt.figure(figsize=(15, 10))
sns.boxplot(x='Country', y='Average Cost for two', data=merged_df)
plt.title('Average Cost for Two by Country')
plt.xticks(rotation=90)
plt.show()

In [None]:
le = LabelEncoder()

merged_df['Has Table booking'] = le.fit_transform(merged_df['Has Table booking'])
merged_df['Has Online delivery'] = le.fit_transform(merged_df['Has Online delivery'])
merged_df['Is delivering'] = le.fit_transform(merged_df['Is delivering'])
merged_df['Switch to order menu'] = le.fit_transform(merged_df['Switch to order menu'])
merged_df['Rating color'] = le.fit_transform(merged_df['Rating color'])
merged_df['Rating text'] = le.fit_transform(merged_df['Rating text'])
merged_df['Country'] = le.fit_transform(merged_df['Country'])
merged_df['Currency'] = le.fit_transform(merged_df['Currency'])

print(merged_df.head())

In [None]:
X_cost = merged_df[['Country', 'Longitude', 'Latitude', 'Cuisines', 'Has Table booking',
                   'Has Online delivery', 'Is delivering', 'Switch to order menu', 'Price range', 'Aggregate Rating',
                   'Votes']]
y_cost = merged_df['Average Cost for two']

X_train_cost, X_test_cost, y_train_cost, y_test_cost = train_test_split(X_cost, y_cost, test_size=0.2, random_state=42)

lr_model = LinearRegression()
lr_model.fit(X_train_cost, y_train_cost)

y_pred_cost = lr_model.predict(X_test_cost)
mse_cost = mean_squared_error(y_test_cost, y_pred_cost)
print(f'Mean Squared Error for Average Cost for Two: {mse_cost}')

In [None]:
X_price = merged_df[['Country', 'Longitude', 'Latitude', 'Cuisines', 'Has Table booking', 
                     'Has Online delivery', 'Is delivering', 'Switch to order menu', 'Aggregate Rating', 'Votes', 
                     'Average Cost for two']]
y_price = merged_df['Price range']

X_train_price, X_test_price, y_train_price, y_test_price = train_test_split(X_price, y_price, test_size=0.2, random_state=42)

rf_model = RandomForestClassifier()
rf_model.fit(X_train_price, y_train_price)

y_pred_price = rf_model.predict(X_test_price)
accuracy_price = accuracy_score(y_test_price, y_pred_price)
print(f'Accuracy for Price Range: {accuracy_price}')