<a href="https://colab.research.google.com/github/anish170805/Data-Analytics/blob/main/Zomato_data_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Zomato Data Analysis Report

This report presents an exploratory data analysis of a dataset containing information about restaurants from Zomato. The analysis aims to uncover insights into various aspects of the restaurant landscape, including popular restaurant categories, customer preferences, and the impact of online ordering.

The analysis covers the following key areas:

- Identification of popular restaurant categories.
- Analysis of customer engagement through votes across different restaurant types.
- Identification of the most voted restaurants.
- Exploration of the prevalence of online ordering among restaurants.
- Examination of the distribution of restaurant ratings.
- Analysis of preferred price ranges for dining.
- Comparison of ratings between restaurants that accept online orders and those that do not.
- Investigation of the relationship between online ordering availability and restaurant type.

Through visualizations and data aggregation, this analysis provides a comprehensive overview of the restaurant data, highlighting trends and patterns that can be valuable for understanding the local food scene.

## Dataset Details

The dataset used for this analysis is titled "Zomato-data-.csv". It contains information about various restaurants, including the following columns:

- **name**: The name of the restaurant.
- **online_order**: Indicates whether the restaurant accepts online orders (Yes/No).
- **book_table**: Indicates whether the restaurant accepts table bookings (Yes/No).
- **rate**: The rating of the restaurant (originally in object format, converted to float).
- **votes**: The number of votes the restaurant has received.
- **approx_cost(for two people)**: The approximate cost for a meal for two people.
- **listed_in(type)**: The type of restaurant (e.g., Buffet, Cafes, Dining).

The dataset was loaded into a pandas DataFrame for cleaning, exploration, and visualization. Initial data cleaning involved converting the 'rate' column to a numerical format. The dataset was found to have no missing values.

## Importing libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Loading the dataset

In [None]:
df = pd.read_csv('/content/drive/MyDrive/Zomato-data-.csv')

## Exploring the data

In [None]:
df.head()

In [None]:
df.info()

## Data cleaning and formatting

### 1. rate is in object dtype , even though its in float type as per observation

In [None]:
df['rate'].unique()

we can also see above that its unnecessarily written in fractional format

In [None]:
#converting the dtype of float
def handleRate(value):
  return float(str(value).split('/')[0])

df['rate'] = df['rate'].apply(handleRate)

In [None]:
df['rate']

### 2. checking for null values

In [None]:
df.isnull().sum()

This dataset has got no null values

## Data Visualization

### 1. Let's see the listed_in (type) column to identify popular restaurant categories.

In [None]:
sns.countplot(x=df['listed_in(type)'], palette='viridis')
plt.xlabel('Type of Restaurant')

Conclusion: The majority of the restaurants fall into the dining category.

### 2. Votes by Restaurant Type

In [None]:
grouped_data = df.groupby('listed_in(type)')['votes'].sum()
result = pd.DataFrame({'votes': grouped_data})
plt.plot(result, c='green', marker='o')
plt.xlabel('Type of restaurant')
plt.ylabel('Votes')

Conclusion: Dining restaurants are preferred by a larger number of individuals.

### 3. Identify the most voted restaurent

In [None]:
max_votes = df['votes'].max()
restaurant_with_max_votes = df.loc[df['votes'] == max_votes, 'name']

print('Restaurant(s) with the maximum votes:')
print(restaurant_with_max_votes)

### 4. Exploring the online_order column to see how many restaurants accept online orders.

In [None]:
sns.countplot(x=df['online_order'], palette='coolwarm')
plt.xlabel('Online Order')

Conclusion: This suggests that a majority of the restaurants do not accept online orders.

### 5. checking the distribution of ratings from the rate column.

In [None]:
sns.histplot(df['rate'], kde=True)
plt.xlabel('Ratings')
plt.title('Ratings Distribution')
plt.show()

Conclusion: The majority of restaurants received ratings ranging from 3.5 to 4.

### 6. Analyze the approx_cost(for two people) column to find the preferred price range.

In [None]:
couple_data = df['approx_cost(for two people)']
sns.countplot(x=couple_data, palette='coolwarm')
plt.xlabel('Price Range')
plt.title('Preferred Price Range')
plt.show()

Conclusion: The majority of couples prefer restaurants with an approximate cost of 300 rupees.

### 7. Compare ratings between restaurants that accept online orders and those that don't

In [None]:
plt.figure(figsize=(6,6))
sns.boxplot(data=df, x='online_order', y='rate', palette='viridis')
plt.xlabel('Online Order')
plt.ylabel('Ratings')
plt.title('Ratings Comparison')
plt.show()

Conclusion: Offline orders received lower ratings in comparison to online orders which obtained excellent ratings

### 8. Find the relationship between order mode (online_order) and restaurant type (listed_in(type))

In [None]:
pivot_table = df.pivot_table(index='listed_in(type)', columns='online_order', aggfunc='size', fill_value=0)
sns.heatmap(pivot_table, annot=True, cmap='YlGnBu', fmt='d')
plt.title('Heatmap')
plt.xlabel('Online Order')
plt.ylabel('Listed In (Type)')
plt.show()

Conclusion: dining restaurants primarily accept offline orders whereas cafes primarily receive online orders. This suggests that clients prefer to place orders in person at restaurants but prefer online ordering at cafes.