# Food Delivery Time Prediction

In the rapidly evolving food delivery industry, companies like **Zomato** and **Swiggy** rely heavily on accurate delivery time estimates to maintain transparency and enhance customer satisfaction. Reliable time predictions not only improve user experience but also optimize resource allocation and operational efficiency for delivery services.

This project focuses on building a **Machine Learning model to predict food delivery times** using historical data. The primary objective is to forecast the expected delivery duration based on factors such as order placement time, distance, and real-world conditions like traffic and weather (if available).

We utilize the [Food Delivery Dataset by Gaurav Malik on Kaggle](https://www.kaggle.com/datasets/gauravmalik26/food-delivery-dataset) as the foundation for this work. This dataset captures key features influencing delivery logistics, allowing us to train a regression model capable of learning from previous delivery patterns.

Using **Python**, we will implement a complete machine learning pipeline — including data preprocessing, feature engineering, model training, and evaluation — with models ranging from basic linear regressors to more sophisticated techniques. The ultimate goal is to build a model that accurately estimates delivery time and can be adapted for real-time deployment in production systems.


In [2]:
import pandas as pd
import numpy as numpy


In [3]:
df = pd.read_csv('deliverytime.txt')
df.head()

Unnamed: 0,ID,Delivery_person_ID,Delivery_person_Age,Delivery_person_Ratings,Restaurant_latitude,Restaurant_longitude,Delivery_location_latitude,Delivery_location_longitude,Type_of_order,Type_of_vehicle,Time_taken(min)
0,4607,INDORES13DEL02,37,4.9,22.745049,75.892471,22.765049,75.912471,Snack,motorcycle,24
1,B379,BANGRES18DEL02,34,4.5,12.913041,77.683237,13.043041,77.813237,Snack,scooter,33
2,5D6D,BANGRES19DEL01,23,4.4,12.914264,77.6784,12.924264,77.6884,Drinks,motorcycle,26
3,7A6A,COIMBRES13DEL02,38,4.7,11.003669,76.976494,11.053669,77.026494,Buffet,motorcycle,21
4,70A2,CHENRES12DEL01,32,4.6,12.972793,80.249982,13.012793,80.289982,Snack,scooter,30


In [16]:
df.shape

(45593, 11)

In [14]:
df.describe()

Unnamed: 0,Delivery_person_Age,Delivery_person_Ratings,Restaurant_latitude,Restaurant_longitude,Delivery_location_latitude,Delivery_location_longitude,Time_taken(min)
count,45593.0,45593.0,45593.0,45593.0,45593.0,45593.0,45593.0
mean,29.544075,4.632367,17.017729,70.231332,17.465186,70.845702,26.294607
std,5.696793,0.327708,8.185109,22.883647,7.335122,21.118812,9.383806
min,15.0,1.0,-30.905562,-88.366217,0.01,0.01,10.0
25%,25.0,4.6,12.933284,73.17,12.988453,73.28,19.0
50%,29.0,4.7,18.546947,75.898497,18.633934,76.002574,26.0
75%,34.0,4.8,22.728163,78.044095,22.785049,78.107044,32.0
max,50.0,6.0,30.914057,88.433452,31.054057,88.563452,54.0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45593 entries, 0 to 45592
Data columns (total 11 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ID                           45593 non-null  object 
 1   Delivery_person_ID           45593 non-null  object 
 2   Delivery_person_Age          45593 non-null  int64  
 3   Delivery_person_Ratings      45593 non-null  float64
 4   Restaurant_latitude          45593 non-null  float64
 5   Restaurant_longitude         45593 non-null  float64
 6   Delivery_location_latitude   45593 non-null  float64
 7   Delivery_location_longitude  45593 non-null  float64
 8   Type_of_order                45593 non-null  object 
 9   Type_of_vehicle              45593 non-null  object 
 10  Time_taken(min)              45593 non-null  int64  
dtypes: float64(5), int64(2), object(4)
memory usage: 3.8+ MB


In [8]:
# checking for null values
df.isnull().sum()


ID                             0
Delivery_person_ID             0
Delivery_person_Age            0
Delivery_person_Ratings        0
Restaurant_latitude            0
Restaurant_longitude           0
Delivery_location_latitude     0
Delivery_location_longitude    0
Type_of_order                  0
Type_of_vehicle                0
Time_taken(min)                0
dtype: int64

In [10]:
# checking for duplicate values
df.duplicated().sum()



0

In [13]:

max_display = 10  # Maximum number of unique values to display
for col in df.columns:
    unique_vals = df[col].unique()
    
    print(f"Column: {col}")
    
    print(f" Unique Count: {len(unique_vals)}")
    print(f" Unique Values (first {min(len(unique_vals), max_display)}): {unique_vals[:max_display]}")
    print("-" * 50)


Column: ID
 Unique Count: 45451
 Unique Values (first 10): ['4607' 'B379' '5D6D' '7A6A' '70A2' '9BB4' '95B4' '9EB2' '1102' 'CDCD']
--------------------------------------------------
Column: Delivery_person_ID
 Unique Count: 1320
 Unique Values (first 10): ['INDORES13DEL02' 'BANGRES18DEL02' 'BANGRES19DEL01' 'COIMBRES13DEL02'
 'CHENRES12DEL01' 'HYDRES09DEL03' 'RANCHIRES15DEL01' 'MYSRES15DEL02'
 'HYDRES05DEL02' 'DEHRES17DEL01']
--------------------------------------------------
Column: Delivery_person_Age
 Unique Count: 22
 Unique Values (first 10): [37 34 23 38 32 22 33 35 36 21]
--------------------------------------------------
Column: Delivery_person_Ratings
 Unique Count: 28
 Unique Values (first 10): [4.9 4.5 4.4 4.7 4.6 4.8 4.2 4.3 4.  4.1]
--------------------------------------------------
Column: Restaurant_latitude
 Unique Count: 657
 Unique Values (first 10): [22.745049 12.913041 12.914264 11.003669 12.972793 17.431668 23.369746
 12.352058 17.433809 30.327968]
-----------------