
# About Dataset

This dataset is designed for predicting food delivery times based on various influencing factors such as distance, weather, traffic conditions, and time of day. It offers a practical and engaging challenge for machine learning practitioners, especially those interested in logistics and operations research.

## Key Features:

- Order_ID: Unique identifier for each order.

- Distance_km: The delivery distance in kilometers.

- Weather: Weather conditions during the delivery, including Clear, Rainy, Snowy, Foggy, and Windy.

- Traffic_Level: Traffic conditions categorized as Low, Medium, or High.

- Time_of_Day: The time when the delivery took place, categorized as Morning, Afternoon, Evening, or Night.

- Vehicle_Type: Type of vehicle used for delivery, including Bike, Scooter, and Car.

- Preparation_Time_min: The time required to prepare the order, measured in minutes.

- Courier_Experience_yrs: Experience of the courier in years.

- Delivery_Time_min: The total delivery time in minutes (target variable).

In [1]:
import pandas as pd
import numpy as np


In [4]:
df=pd.read_csv('D:\Food Delivery Time Prediction\Data\RawData\Food_Delivery_Times.csv')

In [5]:
df.head()

Unnamed: 0,Order_ID,Distance_km,Weather,Traffic_Level,Time_of_Day,Vehicle_Type,Preparation_Time_min,Courier_Experience_yrs,Delivery_Time_min
0,522,7.93,Windy,Low,Afternoon,Scooter,12,1.0,43
1,738,16.42,Clear,Medium,Evening,Bike,20,2.0,84
2,741,9.52,Foggy,Low,Night,Scooter,28,1.0,59
3,661,7.44,Rainy,Medium,Afternoon,Scooter,5,1.0,37
4,412,19.03,Clear,Low,Morning,Bike,16,5.0,68


In [20]:
df.shape

(1000, 9)

In [33]:
df[df.isna().any(axis=1)]

Unnamed: 0,Order_ID,Distance_km,Weather,Traffic_Level,Time_of_Day,Vehicle_Type,Preparation_Time_min,Courier_Experience_yrs,Delivery_Time_min
6,627,9.52,Clear,Low,,Bike,12,1.0,49
14,939,2.80,Clear,High,Morning,Scooter,10,,33
24,211,11.20,Clear,Medium,Morning,Bike,23,,73
42,313,0.99,,Medium,Evening,Bike,15,,32
71,494,4.17,,Low,Evening,Scooter,5,1.0,22
...,...,...,...,...,...,...,...,...,...
974,414,11.68,Clear,,Afternoon,Scooter,25,7.0,70
976,344,8.96,Snowy,,Morning,Car,6,5.0,51
987,331,7.44,Rainy,Low,Evening,Bike,27,,53
988,215,14.39,Rainy,Medium,Morning,Scooter,6,,50


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Order_ID                1000 non-null   int64  
 1   Distance_km             1000 non-null   float64
 2   Weather                 970 non-null    object 
 3   Traffic_Level           970 non-null    object 
 4   Time_of_Day             970 non-null    object 
 5   Vehicle_Type            1000 non-null   object 
 6   Preparation_Time_min    1000 non-null   int64  
 7   Courier_Experience_yrs  970 non-null    float64
 8   Delivery_Time_min       1000 non-null   int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 70.4+ KB


In [10]:
df.Weather.unique()

array(['Windy', 'Clear', 'Foggy', 'Rainy', 'Snowy', nan], dtype=object)

In [11]:
df.Traffic_Level.unique()

array(['Low', 'Medium', 'High', nan], dtype=object)

In [15]:
df.Time_of_Day.unique()

array(['Afternoon', 'Evening', 'Night', 'Morning', nan], dtype=object)

In [13]:
df.Vehicle_Type.unique()

array(['Scooter', 'Bike', 'Car'], dtype=object)

In [102]:
df.isna().sum()

Distance_km                0
Preparation_Time_min       0
Courier_Experience_yrs    30
Delivery_Time_min          0
Weather_Foggy              0
Weather_Rainy              0
Weather_Snowy              0
Weather_Windy              0
Vehicle_Type_Car           0
Vehicle_Type_Scooter       0
Traffic_Level_encoded      0
Time_of_day_encoded        0
dtype: int64

# Now since these all these categories contain null values so we used to replace then using mode value

In [37]:
weather_mode=df.Weather.mode()[0]
df.Weather.fillna(weather_mode,inplace=True)

In [39]:
traffic_level_mode=df.Traffic_Level.mode()[0]
df.Traffic_Level.fillna(traffic_level_mode,inplace=True)

In [42]:
time_of_day_mode=df.Time_of_Day.unique()[0]
df.Time_of_Day.fillna(time_of_day_mode,inplace=True)

In [45]:
vehicle_type_mode=df.Vehicle_Type.mode()[0]
df.Vehicle_Type.fillna(vehicle_type_mode,inplace=True)

In [47]:
df.head(2)

Unnamed: 0,Order_ID,Distance_km,Weather,Traffic_Level,Time_of_Day,Vehicle_Type,Preparation_Time_min,Courier_Experience_yrs,Delivery_Time_min
0,522,7.93,Windy,Low,Afternoon,Scooter,12,1.0,43
1,738,16.42,Clear,Medium,Evening,Bike,20,2.0,84


# Since Courier_Experience_yrs is a numerical value so we change the null values with mean values

In [112]:
exp_mean=int(df['Courier_Experience_yrs'].mean())
df.Courier_Experience_yrs.fillna(exp_mean,inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df.Courier_Experience_yrs.fillna(exp_mean,inplace=True)


# Now we have to do data encoding to convert categorical data into numerical data

- for Weather,VehicleType we use one hot encoding to avoid introducing false order 
- for Traffic_level,TimeOfDay we use ordinal encoding because we want to introduce order

In [49]:
from sklearn.preprocessing import OneHotEncoder

In [56]:
weather_encoding=OneHotEncoder(drop='first')
weather_encoded=weather_encoding.fit_transform(df[['Weather']]).toarray()

In [57]:
weather_encoded_df=pd.DataFrame(weather_encoded,columns=weather_encoding.get_feature_names_out())
weather_encoded_df

Unnamed: 0,Weather_Foggy,Weather_Rainy,Weather_Snowy,Weather_Windy
0,0.0,0.0,0.0,1.0
1,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0
3,0.0,1.0,0.0,0.0
4,0.0,0.0,0.0,0.0
...,...,...,...,...
995,0.0,0.0,0.0,0.0
996,0.0,1.0,0.0,0.0
997,0.0,0.0,1.0,0.0
998,0.0,0.0,0.0,0.0


In [63]:
## Adding encoded values to the data set
df=pd.concat([df.reset_index(drop=True),weather_encoded_df],axis=1)

In [65]:
## droping weather feature from the dataset
df.drop('Weather',axis=1,inplace=True)

In [66]:
vehicle_type_ecoder=OneHotEncoder(drop='first')
vehicle_type_ecoded=vehicle_type_ecoder.fit_transform(df[['Vehicle_Type']]).toarray()

In [69]:
vehicle_type_ecoded_df=pd.DataFrame(vehicle_type_ecoded,columns=vehicle_type_ecoder.get_feature_names_out())


In [72]:
df=pd.concat([df.reset_index(drop=True),vehicle_type_ecoded_df],axis=1)

In [73]:
df.drop('Vehicle_Type',axis=1,inplace=True)

In [None]:
## Ordinal encoding for traffic_level 
mapping = {'Low': 1, 'Medium': 2, 'High': 3}

df['Traffic_Level_encoded']=df['Traffic_Level'].map(mapping)

df.drop('Traffic_Level',axis=1,inplace=True)

In [85]:
## Ordinal encoding for and Time_of_day
time_mapping = {
    'Afternoon': 1,
    'Evening': 2,
    'Night': 3,
    'Morning': 4
}

df['Time_of_day_encoded']=df['Time_of_Day'].map(time_mapping)

In [89]:
df.drop('Time_of_Day',axis=1,inplace=True)

In [92]:
df.columns

Index(['Order_ID', 'Distance_km', 'Preparation_Time_min',
       'Courier_Experience_yrs', 'Delivery_Time_min', 'Weather_Foggy',
       'Weather_Rainy', 'Weather_Snowy', 'Weather_Windy', 'Vehicle_Type_Car',
       'Vehicle_Type_Scooter', 'Traffic_Level_encoded', 'Time_of_day_encoded'],
      dtype='object')

## Now in the dataset the Order_ID is not that much important to predict the Delivery_time so we drop it

In [93]:
df.drop('Order_ID',axis=1,inplace=True)

In [95]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Distance_km             1000 non-null   float64
 1   Preparation_Time_min    1000 non-null   int64  
 2   Courier_Experience_yrs  970 non-null    float64
 3   Delivery_Time_min       1000 non-null   int64  
 4   Weather_Foggy           1000 non-null   float64
 5   Weather_Rainy           1000 non-null   float64
 6   Weather_Snowy           1000 non-null   float64
 7   Weather_Windy           1000 non-null   float64
 8   Vehicle_Type_Car        1000 non-null   float64
 9   Vehicle_Type_Scooter    1000 non-null   float64
 10  Traffic_Level_encoded   1000 non-null   int64  
 11  Time_of_day_encoded     1000 non-null   int64  
dtypes: float64(8), int64(4)
memory usage: 93.9 KB


In [117]:
df.isna().sum()

Distance_km               0
Preparation_Time_min      0
Courier_Experience_yrs    0
Delivery_Time_min         0
Weather_Foggy             0
Weather_Rainy             0
Weather_Snowy             0
Weather_Windy             0
Vehicle_Type_Car          0
Vehicle_Type_Scooter      0
Traffic_Level_encoded     0
Time_of_day_encoded       0
dtype: int64

In [118]:
## Saving the dataset file to the processed folder

import os
processed_dir = os.path.join("..", "Data", "Processed")
os.makedirs(processed_dir,exist_ok=True)
processed_file_path=os.path.join(processed_dir,"delivery_data_cleaned.csv")

df.to_csv(processed_file_path,index=False)