![Screenshot%202024-06-24%20122528.png](attachment:Screenshot%202024-06-24%20122528.png)

# PROBLEM STATEMENT 

### The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business. 

## DATA DESCRIPTION 

## The data contains the different data related to a food order. The detailed data dictionary is given below.
### Data Dictionary: 
#### •	order_id: Unique ID of the order
#### •	customer_id: ID of the customer who ordered the food
#### •	restaurant_name: Name of the restaurant
#### •	cuisine_type: Cuisine ordered by the customer
#### •	cost: Cost of the order
#### •	day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
#### •	rating: Rating given by the customer out of 5
#### •	food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
#### •	delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person's pick-up confirmation and drop-off information


# LIST OF TABLES 

#### 1-	Top 5 rows.
#### 2-	Last 5 rows
#### 3-	Shape of dataset.
#### 4-	Datatypes of each feature.
#### 5-	Statistical summary 
#### 6-	Null values
#### 7-	Duplicate values
#### 8-	Anomalies or wrong entries.
#### 9-	Outliers and their authenticity.
#### 10-	Data Cleaning 
#### 11- Order Analysis 
#### 12- Customer Behaviour 
#### 13- Restaurent Performance 
#### 14- Demand patterns 
#### 15- Operational efficiency 
#### 16- Customer insights 


## IMPORTING LIBRARIES 

In [132]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# LOADING DATASET 

## APPLYING BASIC DATA ANALYSIS STEPS :
### 1-	Display the top 5 rows.
### 2-	Display the last 5 rows
### 3-	Check the shape of dataset.
### 4-	Check the datatypes of each feature.
### 5-	Check the Statistical summary 
### 6-	Check the null values
### 7-	Check the duplicate values
### 8-	Check the anomalies or wrong entries.
### 9-	Check the outliers and their authenticity.
### 10-	Do the necessary data cleaning steps like dropping duplicates, unnecessary columns, null value imputation, outliers treatment etc.


In [133]:
df = pd.read_csv('2-foodhub_order_New.csv')

## HEAD 

![Screenshot%202024-06-24%20124647.png](attachment:Screenshot%202024-06-24%20124647.png)

## TAIL

![Screenshot%202024-06-24%20124727.png](attachment:Screenshot%202024-06-24%20124727.png)

## SHAPE 

(1162, 9)

## DATATYPES /SUMMARY 

![Screenshot%202024-06-24%20124900.png](attachment:Screenshot%202024-06-24%20124900.png)

## STATISTICAL SUMMARY 

![Screenshot%202024-06-24%20124931.png](attachment:Screenshot%202024-06-24%20124931.png)

## OBSERVATION :

####  minimum value of cost order can not be 0, so will have to check later
#### maximum value fo cost of order is much higher, it is an outlier need to check later

### CHECKING FOR NULL VALUES 

![Screenshot%202024-06-24%20125002.png](attachment:Screenshot%202024-06-24%20125002.png)

## OBSERVATION :

#### There are 5 null entries in the dataset, 3 in cuisine_type and 2 in food_preparation_time, will have to check later

## DUPLICATES 

0

#### here i will check for deliver_time and rating column because as seen earlier there datatype should be numerical in real life but it is object

for deliver_time column: ['20' '?' '28' '15' '24' '21' '30' '26' '22' '17' '23' '25' '16' '29' '27'
 '18' '31' '32' '19' '33']
for rating column: ['Not given' '5' '3' '4']

## BOX PLOT FOR NUMERICAL COLUMNS : 

#### To check the outliers 

## DUPLICATES 


0


## DATA CLEANING :

#### 1. REPLACING WRONG ENTRIES 
#### 2. REMOVING NULL VALUES
#### 3. REMOVING DUPLICATES 

array(['20', '?', '28', '15', '24', '21', '30', '26', '22', '17', '23',
       '25', '16', '29', '27', '18', '31', '32', '19', '33'], dtype=object)

![Screenshot%202024-06-24%20125236.png](attachment:Screenshot%202024-06-24%20125236.png)

![Screenshot%202024-06-24%20125310.png](attachment:Screenshot%202024-06-24%20125310.png)

![Screenshot%202024-06-24%20125358.png](attachment:Screenshot%202024-06-24%20125358.png)

![Screenshot%202024-06-24%20125310.png](attachment:Screenshot%202024-06-24%20125310.png)

![Screenshot%202024-06-24%20125526.png](attachment:Screenshot%202024-06-24%20125526.png)

array(['Not given', '5', '3', '4'], dtype=object)


![Screenshot%202024-06-24%20125652.png](attachment:Screenshot%202024-06-24%20125652.png)

In [153]:
df[df['rating']=='Not given'].T

Unnamed: 0,0,1,6,10,14,16,17,21,23,24,...,1874,1875,1878,1881,1883,1887,1891,1892,1895,1897
order_id,1477147,1477685,1477894,1477895,1478198,1477486,1477373,1478226,1478014,1476714,...,1477377,1478039,1477194,1476700,1476748,1476873,1476981,1477473,1477819,1478056
customer_id,337525,358141,157711,143926,62667,104555,139885,137565,54630,363783,...,198802,292343,62540,127036,109906,237616,138586,97838,35309,120353
restaurant_name,Hangawi,Blue Ribbon Sushi Izakaya,The Meatball Shop,Big Wong Restaurant _¤¾Ñ¼,Lucky's Famous Burgers,Sushi of Gari,Blue Ribbon Sushi Izakaya,Shake Shack,Tortaria,Cafe Mogador,...,DuMont Burger,Amy Ruth's,Blue Ribbon Sushi,Shake Shack,The Meatball Shop,Shake Shack,Shake Shack,Han Dynasty,Blue Ribbon Sushi,Blue Ribbon Sushi
cuisine_type,Korean,Japanese,Italian,Chinese,American,Japanese,Japanese,American,Mexican,Middle Eastern,...,American,Southern,Japanese,American,American,American,American,Chinese,Japanese,Japanese
cost_of_the_order,30.75,12.08,6.07,5.92,12.13,16.98,33.03,15.91,8.92,15.86,...,12.56,12.23,5.92,12.23,9.27,5.82,5.82,29.15,25.22,19.45
day_of_the_week,Weekend,Weekend,Weekend,Weekday,Weekday,Weekend,Weekend,Weekend,Weekend,Weekday,...,Weekend,Weekday,Weekday,Weekend,Weekend,Weekend,Weekend,Weekend,Weekday,Weekend
rating,Not given,Not given,Not given,Not given,Not given,Not given,Not given,Not given,Not given,Not given,...,Not given,Not given,Not given,Not given,Not given,Not given,Not given,Not given,Not given,Not given
food_preparation_time,25.0,25.0,28.0,34.0,23.0,30.0,21.0,25.0,33.0,32.0,...,35.0,32.0,27.0,27.0,24.0,26.0,22.0,29.0,31.0,28.0
delivery_time,20.0,,21.0,28.0,30.0,16.0,22.0,20.0,16.0,29.0,...,19.0,33.0,31.0,18.0,23.0,30.0,28.0,21.0,24.0,24.0


![Screenshot%202024-06-24%20125310.png](attachment:Screenshot%202024-06-24%20125310.png)

In [154]:
df['rating'].unique()

array(['Not given', '5', '3', '4'], dtype=object)

array(['Not given', '5', '3', '4'], dtype=object)

In [155]:
df['rating'].value_counts()

rating
Not given    736
5            588
4            386
3            188
Name: count, dtype: int64

![Screenshot%202024-06-24%20125814.png](attachment:Screenshot%202024-06-24%20125814.png)

### TABLES HAVING"NOT GIVEN" VALUES 

![Screenshot%202024-06-24%20145252.png](attachment:Screenshot%202024-06-24%20145252.png)

### TABLES WITHOUT HAVING "NOT GIVEN" VALUES 

![Screenshot%202024-06-24%20130727.png](attachment:Screenshot%202024-06-24%20130727.png)

array(['5', '3', '4'], dtype=object)

![Screenshot%202024-06-24%20125310.png](attachment:Screenshot%202024-06-24%20125310.png)

### TYPECASTING 

![Screenshot%202024-06-24%20130829.png](attachment:Screenshot%202024-06-24%20130829.png)

### fetch all the rows having atleast one null value

![Screenshot%202024-06-24%20130858.png](attachment:Screenshot%202024-06-24%20130858.png)

### check for missing value in columns


![Screenshot%202024-06-24%20130920.png](attachment:Screenshot%202024-06-24%20130920.png)

### check fro percentage wise missing values in columns


![Screenshot%202024-06-24%20130953.png](attachment:Screenshot%202024-06-24%20130953.png)

### firstly checking for outliers in numeric columns where there are null values

![Screenshot%202024-06-24%20131028.png](attachment:Screenshot%202024-06-24%20131028.png)

![Screenshot%202024-06-24%20131042.png](attachment:Screenshot%202024-06-24%20131042.png)

![Screenshot%202024-06-24%20131050.png](attachment:Screenshot%202024-06-24%20131050.png)

### now let us check for outliers in  categorical column which has null entries
### understanding the distribution 



![Screenshot%202024-06-24%20131211.png](attachment:Screenshot%202024-06-24%20131211.png)

### Define the function to calculate lower and upper bounds

### plotting graph to check for outliers in cost_of_the_order column


![Screenshot%202024-06-24%20131333.png](attachment:Screenshot%202024-06-24%20131333.png)

#### Calculate the lower and upper bounds for the 'cost_of_the_order' column

#### Now removing outliers for column cost_of_the_order

### Verifying changes


![Screenshot%202024-06-24%20131406.png](attachment:Screenshot%202024-06-24%20131406.png)

Now for the cuisine_type column it is a categorical column so we use mode

Now for food_preparation_time, diliver_time and rating column they are all numeric and there are no outliers so we use mean

Now verifying the changes


![Screenshot%202024-06-24%20131456.png](attachment:Screenshot%202024-06-24%20131456.png)

# ORDER ANALYSIS 

## 1.What is the total number of orders in the dataset ?


The total number of orders in the data set are:  1162

## 2.What is the average cost of an order?

The average cost of order is: 16.7836

## 3.How many unique customers have placed orders?

Number of unique customers are: 859

## 4.Which restaurant has received the highest number of orders?

The restaurant with the highest number of orders is: Shake Shack

# CUSTOMER BEHAVIOUR

## 1. What is the average rating given by customers?



The average rating by custoemrs is: 4.344234079173838

## 2.How does the rating vary between weekdays and weekends?

Average rating on Weekdays: 4.31
Average rating on Weekends: 4.36

## 3.Which cuisine type is ordered the most?

most ordered cuisine is:  American

## 4.What is the distribution of orders across different days of the week?

day_of_the_week
Weekend    822
Weekday    340
Name: count, dtype: int64

# RESTAURANT PERFORMANCE 

## 	1.It is the average food preparation time for each restaurant?

![Screenshot%202024-06-24%20131734.png](attachment:Screenshot%202024-06-24%20131734.png)

## 2.Which restaurant has the shortest average food preparation time?

Shortest food preparation time restaurant is:  67 Burger
With avg prepartion time:  20.0

## 3.How does the average delivery time compare across different restaurants?

![Screenshot%202024-06-24%20131836.png](attachment:Screenshot%202024-06-24%20131836.png)

![Screenshot%202024-06-24%20131922.png](attachment:Screenshot%202024-06-24%20131922.png)

## 4.Is there a correlation between the cost of the order and the rating given?

![Screenshot%202024-06-24%20132013.png](attachment:Screenshot%202024-06-24%20132013.png)

# DEMAND PATTERNS 

## 1.How does the demand for different cuisine types vary on weekdays versus weekends?

![Screenshot%202024-06-24%20132039.png](attachment:Screenshot%202024-06-24%20132039.png)

## 2.Which day of the week has the highest average order cost?

![Screenshot%202024-06-24%20132110.png](attachment:Screenshot%202024-06-24%20132110.png)

Day with the highest average order cost: Weekend

## 3.What is the most common day for orders to be placed?

The most common day for orders to be placed is: Weekend

## 4.How does the average rating vary by cuisine type?

![Screenshot%202024-06-24%20132258.png](attachment:Screenshot%202024-06-24%20132258.png)

![Screenshot%202024-06-24%20132311.png](attachment:Screenshot%202024-06-24%20132311.png)

# OPERATIONAL EFFICIENCY 

## 1. What is the average delivery time for all orders?

Average delivery time for all orders is:  24.154177433247202

## 2.Which restaurant has the longest average delivery time?

![Screenshot%202024-06-24%20132834.png](attachment:Screenshot%202024-06-24%20132834.png)

## 3.Is there a relationship between food preparation time and delivery time?

![Screenshot%202024-06-24%20132917.png](attachment:Screenshot%202024-06-24%20132917.png)

## 4.How does the delivery time impact customer ratings?

![Screenshot%202024-06-24%20132946.png](attachment:Screenshot%202024-06-24%20132946.png)

# CUSTOMER INSIGHTS 

## 1.What is the repeat order rate (number of customers who have placed more than one order)? 

Repeat Order Rate: 23.86%

## 2.What percentage of orders receive a rating of 4 or higher?

Percentage of orders with rating greater than 4: 50.60%