# Food Delivery Orders EDA
**Author:** Jessy Andújar Cruz

**Date:** 2025-05-27

## Introduction

### Project Description
This project aims to perform an exploratory data analysis (EDA) on a food delivery orders dataset. By investigating trends and patterns in customer orders, cuisine preferences, order costs, and delivery performance, the goal is to uncover actionable insights that could help food delivery platforms and restaurants improve their service and operations.

### Objective
The primary objectives of this analysis are:
- To clean and prepare the dataset for analysis by addressing missing values, data types, and outliers.
- To perform descriptive and visual analyses on order characteristics such as cuisine type, cost, preparation, and delivery times.
- To engineer new features that allow for deeper insights, such as identifying late deliveries and evaluating the relationship between food preparation and delivery times.
- To segment orders by cost and satisfaction, helping identify top-selling cuisines, high-value customers, and potential areas for operational improvement.

### Dataset Information
The dataset, obtained from [Kaggle: Food Ordering and Delivery App Dataset](https://www.kaggle.com/datasets/ahsan81/food-ordering-and-delivery-app-dataset), contains 1,898 records of food delivery orders. Each record includes details such as:
- Order and customer IDs
- Restaurant name and cuisine type
- Cost of the order
- Day of the week for the order
- Customer rating
- Food preparation and delivery times

This analysis will provide a foundation for further data-driven strategies in the food delivery sector.

##### Import libraries

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

##### Load dataset

In [16]:
df = pd.read_csv('../data/raw/food_order.csv')

##### Data overview

In [24]:
df.sample(10) # display a random sample of 10 rows from the dataset

Unnamed: 0,order_id,customer_id,restaurant_name,cuisine_type,cost_of_the_order,day_of_the_week,rating,food_preparation_time,delivery_time
920,1477246,39705,Bukhara Grill,Indian,16.54,Weekend,5,23,17
1043,1477769,129798,TAO,Japanese,8.05,Weekday,Not given,26,31
262,1476667,157578,P.J. Clarke's,American,6.11,Weekend,4,27,25
1794,1476976,300552,Shake Shack,American,22.26,Weekday,5,33,32
76,1477921,97079,Benihana,Japanese,29.29,Weekend,5,20,18
872,1477593,322787,Parm,Italian,12.08,Weekend,5,30,25
669,1477563,385426,Shake Shack,American,17.03,Weekday,5,35,32
776,1476684,35631,Shake Shack,American,19.4,Weekend,5,32,24
1611,1477260,229946,Chipotle Mexican Grill $1.99 Delivery,Mexican,29.1,Weekend,4,27,25
892,1477024,68154,Nobu Next Door,Japanese,12.18,Weekend,5,32,23


In [19]:
df.info() # check data types and null values


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   order_id               1898 non-null   int64  
 1   customer_id            1898 non-null   int64  
 2   restaurant_name        1898 non-null   object 
 3   cuisine_type           1898 non-null   object 
 4   cost_of_the_order      1898 non-null   float64
 5   day_of_the_week        1898 non-null   object 
 6   rating                 1898 non-null   object 
 7   food_preparation_time  1898 non-null   int64  
 8   delivery_time          1898 non-null   int64  
dtypes: float64(1), int64(4), object(4)
memory usage: 133.6+ KB


In [20]:
df.describe() # get summary statistics for numerical columns

Unnamed: 0,order_id,customer_id,cost_of_the_order,food_preparation_time,delivery_time
count,1898.0,1898.0,1898.0,1898.0,1898.0
mean,1477496.0,171168.478398,16.498851,27.37197,24.161749
std,548.0497,113698.139743,7.483812,4.632481,4.972637
min,1476547.0,1311.0,4.47,20.0,15.0
25%,1477021.0,77787.75,12.08,23.0,20.0
50%,1477496.0,128600.0,14.14,27.0,25.0
75%,1477970.0,270525.0,22.2975,31.0,28.0
max,1478444.0,405334.0,35.41,35.0,33.0


##### Data cleaning an manipulation
- Check for duplicates: (e.g., duplicated order_id)
- Check for outliers: in numerical columns (prep time, delivery time, cost)
- Standardize categorical values: e.g., ensure day_of_the_week has only “Weekend”/“Weekday” (no typos or variations).
- Rating column is of type object. It would be good to change it to type int16 and use null for 'not given' rating.
- It would be good to add a order_total_time column.
- Add a food_preparation_time_mean column to help the restaurant improve food preparation times.
- Based on the new added column 'food_preparation_time_mean' create a column if the food is late.
- Create the same logic for 'order_total_time'.
- Create a column if the delivery was good or not based on the client rating (4-5 is good)
- Based on the 25%, 50% and 75% percentile add a column if the cost_of_the_order is expensive, normal or cheap.
- Customer-level features: e.g., number of orders per customer (loyalty), average spend.
- Restaurant-level features: e.g., total orders, average rating.
- Ensure all costs are positive.
- Make sure food prep and delivery times are non-negative.