# Zomato Project

**Problem Statement:**  
Analyze and optimize **Zomato restaurant data** to extract meaningful insights and/or build predictive models that can assist in enhancing customer experience, restaurant performance, or food delivery logistics.

---

*By-* **Mowlick Armstrong**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
zdf = pd.read_csv("zomato.csv", encoding = 'ISO-8859-1')
zdf

In [None]:
zdf.head()

In [None]:
zdf.isnull().sum()

In [None]:
zdf.info()

In [None]:
zdf.describe()

In [None]:
zdf_num_col = zdf.select_dtypes(include = "number")
zdf_num_col.head()

In [None]:
cdf = pd.read_excel("Country-Code.xlsx")
cdf.head()

### 1. Merge the country sheet and Zomato file to get the country code

In [None]:
merged_df = pd.merge(zdf,cdf, on = 'Country Code', how = 'inner')
merged_df.head()

### 2. Top rated restaurants in each city in India (based on rating and votes)

In [None]:
india_df = merged_df[merged_df['Country'] == 'India']
top_rated = india_df.sort_values(['Aggregate rating', 'Votes'], ascending=[False, False])
top_rated_per_city = top_rated.groupby('City').first().reset_index()
print(top_rated_per_city[['Restaurant Name', 'City', 'Aggregate rating', 'Votes']])

### 3. Relationship between rating and votes

In [None]:
sns.scatterplot(data=merged_df, x='Aggregate rating', y='Votes')
plt.title('Rating vs Votes')
plt.show()

### 4. Number of restaurants in each country

In [None]:
restaurant_count = merged_df['Country'].value_counts()
print(restaurant_count)

### 5. Top 5 restaurants with online delivery

In [None]:
online_delivery_df = merged_df[merged_df['Has Online delivery'] == 'Yes']
top_online_delivery = online_delivery_df.sort_values(['Aggregate rating', 'Votes'], ascending=[False, False])
print(top_online_delivery[['Restaurant Name', 'Aggregate rating', 'Votes']].head(5))

### 6. Cheap but best restaurants in a city (low cost, high rating)

In [None]:
cheap_best = merged_df.sort_values(['Average Cost for two', 'Aggregate rating'], ascending=[True, False])
cheap_best_per_city = cheap_best.groupby('City').first().reset_index()
print(cheap_best_per_city[['City', 'Restaurant Name', 'Average Cost for two', 'Aggregate rating']].head())

### 7. Top cuisines in each region

In [None]:
top_cuisines = merged_df.groupby(['Country', 'City'])['Cuisines'].agg(
    lambda x: x.value_counts().index[0] if not x.value_counts().empty else None
)
print(top_cuisines.head(15))

In [None]:
## In indian region
# Filter the Series where the first level of the MultiIndex (Country) is 'India'
indian_cities_top_cuisines = top_cuisines.loc['India']
print(indian_cities_top_cuisines.head())

### 8. Aggregate rating of all restaurants in each city in a country

In [None]:
city_rating = merged_df.groupby(['Country', 'City'])['Aggregate rating'].mean().reset_index()
print(city_rating.head())

### 9. Does rating influence the cost of a restaurant? (Boxplot)

In [None]:
plt.figure(figsize=(7, 4))
sns.boxplot(x='Aggregate rating', y='Average Cost for two', data=merged_df)
plt.title('Cost vs Rating')
plt.xticks(rotation=90)
plt.show()

### 10. Top percentage cover of restaurants in a city (Pie graph)

In [None]:
city_counts = merged_df['City'].value_counts().head(5)
plt.pie(city_counts, labels=city_counts.index, autopct='%1.1f%%', startangle=140)
plt.title('Top 5 Cities by Restaurant Share')
plt.axis('equal')
plt.show()

### 11. Top cuisines in Indian restaurants (Pie graph)

In [None]:
indian_df = merged_df[merged_df['Country'] == 'India']
cuisines = indian_df['Cuisines'].dropna().str.split(',').explode()
top_cuisines = cuisines.value_counts().head(5)

plt.pie(top_cuisines, labels=top_cuisines.index, autopct='%1.1f%%', startangle=140)
plt.title('Top 5 Cuisines in Indian Restaurants')
plt.axis('equal')
plt.show()