### FDAE Team 2: Keng Jia Chi, Jasmine Tye Jia Wen, Lee Qi Yuan

## Problem Statement: How can we predict time taken for food delivery accurately and precisely?
### Relevance of Problem Statement:
Accurately predicting food delivery time is a critical challenge, particularly in the context of popular food delivery platforms like FoodPanda, Grab, and Deliveroo. Addressing this issue holds significant relevance as it directly impacts various stakeholders, including customers, delivery drivers, and businesses. By analyzing and resolving this problem, businesses can enhance customer satisfaction, optimize operational efficiency, ensure timely food delivery, improve driver satisfaction, gain a competitive edge, and potentially increase revenue within the competitive food delivery industry.

A more accurate food delivery time prediction would also benefit consumers by allowing for better planning of meals and avoiding frustation and wasted waiting time. Consumers would also be able to make more informed decisions when choosing restaurants in delivery platforms, setting realistic expectations and avoiding disappointments due to unexpected delay.

Overall, accurately predicting food delivery time is relevant because it benefits both consumers and delivery services.

## Import Relevant packages



In [None]:
!pip install xgboost==2.0.3
!pip install lightgbm
!pip install optuna
!pip install art==6.1

import numpy as np
import pandas as pd
import seaborn as sb
import scipy.stats as ss
import optuna

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
import lightgbm as lgb
from sklearn.ensemble import VotingRegressor
from sklearn.model_selection import KFold, cross_val_score
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from art import text2art
from termcolor import colored

import matplotlib.pyplot as plt # we only need pyplot
sb.set() # set the default Seaborn style for graphics
from math import radians, sin, cos, sqrt, atan2

Collecting xgboost==2.0.3
  Using cached xgboost-2.0.3-py3-none-manylinux2014_x86_64.whl (297.1 MB)
Collecting scipy
  Using cached scipy-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)
Collecting numpy
  Using cached numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Installing collected packages: numpy, scipy, xgboost
Successfully installed numpy-1.24.4 scipy-1.10.1 xgboost-2.0.3
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
[0mCollecting lightgbm
  Using cached lightgbm-4.3.0-py3-none-manylinux_2_28_x86_64.whl (3.1 MB)
Collecting scipy
  Using cached scipy-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)
Collecting numpy
  Using cached numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Installing collected packages: numpy, scipy, lightgbm
Successfully installed lightgbm-4.3.0 numpy-1.24.4 scipy-1.10.1
You should 

## Import of Dataset



In [None]:
foodData = pd.read_csv('FoodDeliveryData.csv')
foodData

Unnamed: 0,ID,Delivery_person_ID,Delivery_person_Age,Delivery_person_Ratings,Restaurant_latitude,Restaurant_longitude,Delivery_location_latitude,Delivery_location_longitude,Order_Date,Time_Orderd,Time_Order_picked,Weatherconditions,Road_traffic_density,Vehicle_condition,Type_of_order,Type_of_vehicle,multiple_deliveries,Festival,City,Time_taken(min)
0,0x4607,INDORES13DEL02,37,4.9,22.745049,75.892471,22.765049,75.912471,19-03-2022,11:30:00,11:45:00,conditions Sunny,High,2,Snack,motorcycle,0,No,Urban,(min) 24
1,0xb379,BANGRES18DEL02,34,4.5,12.913041,77.683237,13.043041,77.813237,25-03-2022,19:45:00,19:50:00,conditions Stormy,Jam,2,Snack,scooter,1,No,Metropolitian,(min) 33
2,0x5d6d,BANGRES19DEL01,23,4.4,12.914264,77.678400,12.924264,77.688400,19-03-2022,08:30:00,08:45:00,conditions Sandstorms,Low,0,Drinks,motorcycle,1,No,Urban,(min) 26
3,0x7a6a,COIMBRES13DEL02,38,4.7,11.003669,76.976494,11.053669,77.026494,05-04-2022,18:00:00,18:10:00,conditions Sunny,Medium,0,Buffet,motorcycle,1,No,Metropolitian,(min) 21
4,0x70a2,CHENRES12DEL01,32,4.6,12.972793,80.249982,13.012793,80.289982,26-03-2022,13:30:00,13:45:00,conditions Cloudy,High,1,Snack,scooter,1,No,Metropolitian,(min) 30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45588,0x7c09,JAPRES04DEL01,30,4.8,26.902328,75.794257,26.912328,75.804257,24-03-2022,11:35:00,11:45:00,conditions Windy,High,1,Meal,motorcycle,0,No,Metropolitian,(min) 32
45589,0xd641,AGRRES16DEL01,21,4.6,0.000000,0.000000,0.070000,0.070000,16-02-2022,19:55:00,20:10:00,conditions Windy,Jam,0,Buffet,motorcycle,1,No,Metropolitian,(min) 36
45590,0x4f8d,CHENRES08DEL03,30,4.9,13.022394,80.242439,13.052394,80.272439,11-03-2022,23:50:00,00:05:00,conditions Cloudy,Low,1,Drinks,scooter,0,No,Metropolitian,(min) 16
45591,0x5eee,COIMBRES11DEL01,20,4.7,11.001753,76.986241,11.041753,77.026241,07-03-2022,13:35:00,13:40:00,conditions Cloudy,High,0,Snack,motorcycle,1,No,Metropolitian,(min) 26


#### This dataset has 20 columns.

#### **Categorical Data:**

Weatherconditions \- Weather condition during delivery \(e.g Sunny, Stormy\)

Road\_traffic\_density \- Road density condition during delivery \(e.g Low, Medium\)

Vehicle\_condition \- Condition of delivery vehicle \(0\-bad, 2\- very good\)

Type\_of\_order \- Order type \(e.g: Buffet, Snack\)

Type\_of\_vehicle \- Delivery vehicle type \(e.g: motorcycle, scooter\)

multiple\_deliveries \- Is delivery person delivering multiple deliveries in a single trip? \(e.g 0 \- only handling that specific delivery order, 1 \- one or more delivery order to deliver before/after\)

Festival \- Is delivery day a festival day?

City \- Delivery location \(e.g: Metropolitian/ Urban\)

#### **Numerical Data:**

Delivery\_person\_Age \- Age of delivery person

ID \- Row ID

Delivery\_person\_ID \- Delivery person ID

Delivery\_person\_Ratings \- Ratings of the delivery person

Restaurant\_latitude \- restaurant location latitude

Restaurant\_longitude \- restaurant location longitude

Delivery\_location\_latitude \- delivery location latitude

Delivery\_location\_longitude \- delivery location longitude

Order\_Date \- date of the order placed by customer

Time\_Orderd \- time customer placed the delivery order

Time\_Order\_picked \- time delivery person picked up the delivery order

Time\_taken\(min\) \- time taken for the successful delivery in minutes



In [None]:
foodData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45593 entries, 0 to 45592
Data columns (total 20 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ID                           45593 non-null  object 
 1   Delivery_person_ID           45593 non-null  object 
 2   Delivery_person_Age          45593 non-null  object 
 3   Delivery_person_Ratings      45593 non-null  object 
 4   Restaurant_latitude          45593 non-null  float64
 5   Restaurant_longitude         45593 non-null  float64
 6   Delivery_location_latitude   45593 non-null  float64
 7   Delivery_location_longitude  45593 non-null  float64
 8   Order_Date                   45593 non-null  object 
 9   Time_Orderd                  45593 non-null  object 
 10  Time_Order_picked            45593 non-null  object 
 11  Weatherconditions            45593 non-null  object 
 12  Road_traffic_density         45593 non-null  object 
 13  Vehicle_conditio

In [None]:
foodData.shape

(45593, 20)

In [None]:
foodData.describe()

Unnamed: 0,Restaurant_latitude,Restaurant_longitude,Delivery_location_latitude,Delivery_location_longitude,Vehicle_condition
count,45593.0,45593.0,45593.0,45593.0,45593.0
mean,17.017729,70.231332,17.465186,70.845702,1.023359
std,8.185109,22.883647,7.335122,21.118812,0.839065
min,-30.905562,-88.366217,0.01,0.01,0.0
25%,12.933284,73.17,12.988453,73.28,0.0
50%,18.546947,75.898497,18.633934,76.002574,1.0
75%,22.728163,78.044095,22.785049,78.107044,2.0
max,30.914057,88.433452,31.054057,88.563452,3.0


In [None]:
# Create a new data frame with only the ID column
ID_data = foodData[['ID']]
unique_ids_count = ID_data['ID'].nunique()
print("Number of unique IDs:", unique_ids_count)

Number of unique IDs: 45593


This show that all the rows in our dataset is unique and no duplicate rows.


## Data Cleaning & Preparation



Remove Null Value in the dataset



In [None]:
#Convert String 'NaN' to np.nan
def convert_nan(df):
    foodData.replace('NaN', float(np.nan), regex=True,inplace=True)

convert_nan(foodData)

In [None]:
#Check null values
foodData.isnull().sum().sort_values(ascending=False)

Delivery_person_Ratings        1908
Delivery_person_Age            1854
Time_Orderd                    1731
City                           1200
multiple_deliveries             993
Weatherconditions               616
Road_traffic_density            601
Festival                        228
ID                                0
Type_of_vehicle                   0
Type_of_order                     0
Vehicle_condition                 0
Time_Order_picked                 0
Delivery_person_ID                0
Order_Date                        0
Delivery_location_longitude       0
Delivery_location_latitude        0
Restaurant_longitude              0
Restaurant_latitude               0
Time_taken(min)                   0
dtype: int64

In [None]:
# Drop rows with null values
foodData.dropna(inplace=True)
foodData.shape

(41368, 20)

Null value that has been dropped = 45593 \- 41368 = 4225 data

We decided to dropped the 4225 data as these may affect the accuracy of any machine learning models we will use to predict the time duration for delivery.


After dropping Null Value, we make some process to the data to make it looks nicer and make it more easier to understand and also ease our analysis process. Firstly we copy the dataset into a new cleaned dataset so that we won't modify the original dataset. Also, we drop some columns which are unnecessary to our analysis such as ID as delivery person ID is already there.

1. Remove \(min\) from time taken column
2. Remove (conditions) from weather conditions column
3. Change jam to very high in road\_traffic\_density
4. Remove column named 'ID' as it is not needed for subsequent used.



In [None]:
cleaned_Data = foodData.copy() #so that we don't edit the original data
cleaned_Data['Time_taken(min)'] = cleaned_Data['Time_taken(min)'].str.replace(r'\(.*?\) ?', '', regex=True).str.strip() #remove min from time_taken
cleaned_Data['Weatherconditions'] = cleaned_Data['Weatherconditions'].str.replace('conditions', '') #remove the word condition in the values in weaether_conditions
cleaned_Data['Road_traffic_density'] = cleaned_Data['Road_traffic_density'].str.replace(r'\bJam\b', 'Very High', regex=True) #Change jam to very high
cleaned_Data.drop(['ID'], axis=1, inplace=True) #remove the useless columns
cleaned_Data

Unnamed: 0,Delivery_person_ID,Delivery_person_Age,Delivery_person_Ratings,Restaurant_latitude,Restaurant_longitude,Delivery_location_latitude,Delivery_location_longitude,Order_Date,Time_Orderd,Time_Order_picked,Weatherconditions,Road_traffic_density,Vehicle_condition,Type_of_order,Type_of_vehicle,multiple_deliveries,Festival,City,Time_taken(min)
0,INDORES13DEL02,37,4.9,22.745049,75.892471,22.765049,75.912471,19-03-2022,11:30:00,11:45:00,Sunny,High,2,Snack,motorcycle,0,No,Urban,24
1,BANGRES18DEL02,34,4.5,12.913041,77.683237,13.043041,77.813237,25-03-2022,19:45:00,19:50:00,Stormy,Very High,2,Snack,scooter,1,No,Metropolitian,33
2,BANGRES19DEL01,23,4.4,12.914264,77.678400,12.924264,77.688400,19-03-2022,08:30:00,08:45:00,Sandstorms,Low,0,Drinks,motorcycle,1,No,Urban,26
3,COIMBRES13DEL02,38,4.7,11.003669,76.976494,11.053669,77.026494,05-04-2022,18:00:00,18:10:00,Sunny,Medium,0,Buffet,motorcycle,1,No,Metropolitian,21
4,CHENRES12DEL01,32,4.6,12.972793,80.249982,13.012793,80.289982,26-03-2022,13:30:00,13:45:00,Cloudy,High,1,Snack,scooter,1,No,Metropolitian,30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45588,JAPRES04DEL01,30,4.8,26.902328,75.794257,26.912328,75.804257,24-03-2022,11:35:00,11:45:00,Windy,High,1,Meal,motorcycle,0,No,Metropolitian,32
45589,AGRRES16DEL01,21,4.6,0.000000,0.000000,0.070000,0.070000,16-02-2022,19:55:00,20:10:00,Windy,Very High,0,Buffet,motorcycle,1,No,Metropolitian,36
45590,CHENRES08DEL03,30,4.9,13.022394,80.242439,13.052394,80.272439,11-03-2022,23:50:00,00:05:00,Cloudy,Low,1,Drinks,scooter,0,No,Metropolitian,16
45591,COIMBRES11DEL01,20,4.7,11.001753,76.986241,11.041753,77.026241,07-03-2022,13:35:00,13:40:00,Cloudy,High,0,Snack,motorcycle,1,No,Metropolitian,26


Also, we change time taken, delivery person age and delivery person ratings from strings to integer to ensure that our data analysis can be done correctly

In [None]:
# changing strings to int 
# 1. Time_taken(min)
cleaned_Data['Time_taken(min)'] = cleaned_Data['Time_taken(min)'].astype(int) # changing into type int
# 2. Delivery_person_Age
cleaned_Data['Delivery_person_Age'] = cleaned_Data['Delivery_person_Age'].astype(int) # changing into type int
# 3. Delivery_person_Ratings 
cleaned_Data['Delivery_person_Ratings'] = cleaned_Data['Delivery_person_Ratings'].astype(float) # changing into type int

cleaned_Data

Unnamed: 0,Delivery_person_ID,Delivery_person_Age,Delivery_person_Ratings,Restaurant_latitude,Restaurant_longitude,Delivery_location_latitude,Delivery_location_longitude,Order_Date,Time_Orderd,Time_Order_picked,Weatherconditions,Road_traffic_density,Vehicle_condition,Type_of_order,Type_of_vehicle,multiple_deliveries,Festival,City,Time_taken(min)
0,INDORES13DEL02,37,4.9,22.745049,75.892471,22.765049,75.912471,19-03-2022,11:30:00,11:45:00,Sunny,High,2,Snack,motorcycle,0,No,Urban,24
1,BANGRES18DEL02,34,4.5,12.913041,77.683237,13.043041,77.813237,25-03-2022,19:45:00,19:50:00,Stormy,Very High,2,Snack,scooter,1,No,Metropolitian,33
2,BANGRES19DEL01,23,4.4,12.914264,77.678400,12.924264,77.688400,19-03-2022,08:30:00,08:45:00,Sandstorms,Low,0,Drinks,motorcycle,1,No,Urban,26
3,COIMBRES13DEL02,38,4.7,11.003669,76.976494,11.053669,77.026494,05-04-2022,18:00:00,18:10:00,Sunny,Medium,0,Buffet,motorcycle,1,No,Metropolitian,21
4,CHENRES12DEL01,32,4.6,12.972793,80.249982,13.012793,80.289982,26-03-2022,13:30:00,13:45:00,Cloudy,High,1,Snack,scooter,1,No,Metropolitian,30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45588,JAPRES04DEL01,30,4.8,26.902328,75.794257,26.912328,75.804257,24-03-2022,11:35:00,11:45:00,Windy,High,1,Meal,motorcycle,0,No,Metropolitian,32
45589,AGRRES16DEL01,21,4.6,0.000000,0.000000,0.070000,0.070000,16-02-2022,19:55:00,20:10:00,Windy,Very High,0,Buffet,motorcycle,1,No,Metropolitian,36
45590,CHENRES08DEL03,30,4.9,13.022394,80.242439,13.052394,80.272439,11-03-2022,23:50:00,00:05:00,Cloudy,Low,1,Drinks,scooter,0,No,Metropolitian,16
45591,COIMBRES11DEL01,20,4.7,11.001753,76.986241,11.041753,77.026241,07-03-2022,13:35:00,13:40:00,Cloudy,High,0,Snack,motorcycle,1,No,Metropolitian,26


Next, we introduce distance from restaurant and destination longitudinal and latitudinal in KM so that we can perform analysis from distance.

In [None]:
cleaned_Data[cleaned_Data['Restaurant_latitude']<0][['Restaurant_latitude','Restaurant_longitude']]

Unnamed: 0,Restaurant_latitude,Restaurant_longitude
92,-27.163303,78.057044
283,-27.165108,78.015053
1091,-15.546594,73.760431
1783,-23.230791,77.437020
1976,-22.539129,88.365507
...,...,...
43860,-15.498603,73.826911
44051,-15.157944,73.950889
44640,-9.982834,76.283268
44933,-19.874733,75.353942


If we checked these values into a geological calculator. it would be somewhere in the middle of the ocean which is wrong. Hence, there might be error where the latitude should be positives. Thus, we change the latitude to positive as atfer changing it to positive it is a valid location.


In [None]:
cleaned_Data['Restaurant_latitude']=cleaned_Data['Restaurant_latitude'].abs()

In [None]:
#Calculate distance between restaurant location & delivery location
def haversine_distance(lat1, lon1, lat2, lon2):
    # Convert latitude and longitude from degrees to radians
    lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])

    # Haversine formula
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    distance = 6371 * c  # Radius of the Earth in kilometers
    return distance

cleaned_Data['Distance_between_restaurant_and_delivery_place(KM)'] = cleaned_Data.apply(lambda row: round(haversine_distance(row['Restaurant_latitude'], row['Restaurant_longitude'], row['Delivery_location_latitude'], row['Delivery_location_longitude'])), axis=1)
cleaned_Data

Unnamed: 0,Delivery_person_ID,Delivery_person_Age,Delivery_person_Ratings,Restaurant_latitude,Restaurant_longitude,Delivery_location_latitude,Delivery_location_longitude,Order_Date,Time_Orderd,Time_Order_picked,Weatherconditions,Road_traffic_density,Vehicle_condition,Type_of_order,Type_of_vehicle,multiple_deliveries,Festival,City,Time_taken(min),Distance_between_restaurant_and_delivery_place(KM)
0,INDORES13DEL02,37,4.9,22.745049,75.892471,22.765049,75.912471,19-03-2022,11:30:00,11:45:00,Sunny,High,2,Snack,motorcycle,0,No,Urban,24,3
1,BANGRES18DEL02,34,4.5,12.913041,77.683237,13.043041,77.813237,25-03-2022,19:45:00,19:50:00,Stormy,Very High,2,Snack,scooter,1,No,Metropolitian,33,20
2,BANGRES19DEL01,23,4.4,12.914264,77.678400,12.924264,77.688400,19-03-2022,08:30:00,08:45:00,Sandstorms,Low,0,Drinks,motorcycle,1,No,Urban,26,2
3,COIMBRES13DEL02,38,4.7,11.003669,76.976494,11.053669,77.026494,05-04-2022,18:00:00,18:10:00,Sunny,Medium,0,Buffet,motorcycle,1,No,Metropolitian,21,8
4,CHENRES12DEL01,32,4.6,12.972793,80.249982,13.012793,80.289982,26-03-2022,13:30:00,13:45:00,Cloudy,High,1,Snack,scooter,1,No,Metropolitian,30,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45588,JAPRES04DEL01,30,4.8,26.902328,75.794257,26.912328,75.804257,24-03-2022,11:35:00,11:45:00,Windy,High,1,Meal,motorcycle,0,No,Metropolitian,32,1
45589,AGRRES16DEL01,21,4.6,0.000000,0.000000,0.070000,0.070000,16-02-2022,19:55:00,20:10:00,Windy,Very High,0,Buffet,motorcycle,1,No,Metropolitian,36,11
45590,CHENRES08DEL03,30,4.9,13.022394,80.242439,13.052394,80.272439,11-03-2022,23:50:00,00:05:00,Cloudy,Low,1,Drinks,scooter,0,No,Metropolitian,16,5
45591,COIMBRES11DEL01,20,4.7,11.001753,76.986241,11.041753,77.026241,07-03-2022,13:35:00,13:40:00,Cloudy,High,0,Snack,motorcycle,1,No,Metropolitian,26,6


After introducing distance using the longitude and latitude, we choose to drop these variables as they will no longer be used to predict time taken for delivery.

In [None]:
cleaned_Data.drop(['Delivery_person_ID', 'Restaurant_latitude', 'Restaurant_longitude', 'Delivery_location_latitude', 'Delivery_location_longitude', 'Order_Date','multiple_deliveries'], axis=1, inplace=True)
cleaned_Data

Unnamed: 0,Delivery_person_Age,Delivery_person_Ratings,Time_Orderd,Time_Order_picked,Weatherconditions,Road_traffic_density,Vehicle_condition,Type_of_order,Type_of_vehicle,Festival,City,Time_taken(min),Distance_between_restaurant_and_delivery_place(KM)
0,37,4.9,11:30:00,11:45:00,Sunny,High,2,Snack,motorcycle,No,Urban,24,3
1,34,4.5,19:45:00,19:50:00,Stormy,Very High,2,Snack,scooter,No,Metropolitian,33,20
2,23,4.4,08:30:00,08:45:00,Sandstorms,Low,0,Drinks,motorcycle,No,Urban,26,2
3,38,4.7,18:00:00,18:10:00,Sunny,Medium,0,Buffet,motorcycle,No,Metropolitian,21,8
4,32,4.6,13:30:00,13:45:00,Cloudy,High,1,Snack,scooter,No,Metropolitian,30,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...
45588,30,4.8,11:35:00,11:45:00,Windy,High,1,Meal,motorcycle,No,Metropolitian,32,1
45589,21,4.6,19:55:00,20:10:00,Windy,Very High,0,Buffet,motorcycle,No,Metropolitian,36,11
45590,30,4.9,23:50:00,00:05:00,Cloudy,Low,1,Drinks,scooter,No,Metropolitian,16,5
45591,20,4.7,13:35:00,13:40:00,Cloudy,High,0,Snack,motorcycle,No,Metropolitian,26,6


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=44a00546-0efa-4f5e-a5a5-ac13bae81f5d' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>