Uber Eats Delivery Times

You are a data scientist working at Uber Eats, one of the worldâ€™s leading food delivery platforms. The company is focused on
improving the accuracy of estimated delivery times to enhance customer satisfaction and operational efficiency. Predicting delivery
time more accurately will not only improve the user experience but also help optimise routing and courier management.

Your task is to develop a machine learning model that predicts the total delivery time (in minutes) for a food order, based on
historical data. This model must use gradient descent to learn the best model parameters for predicting delivery times from the
available features.

Dataset Description

You have access to the following data for previous orders:

Order_ID: A unique identifier for each order.
Distance_km: The delivery distance in kilometres.
Weather: Weather conditions during the delivery.
Traffic_Level: Traffic conditions category.
Time_of_Day: The time of day when the delivery took place.
Vehicle_Type: Type of vehicle used for delivery.
Preparation_Time_min: The time required to prepare the order, measured in minutes.
Courier_Experience_yrs: Experience of the courier/driver in years.
Delivery_Time_min: The total delivery time in minutes (target variable - continuous).

The dataset is named "uber_eats.csv" and can be downloaded from the "Project Datasets" folder on myLMS

3.1. Explain why Linear Regression with Gradient Descent is suitable for predicting Uber Eats delivery time.
(3 marks)

Delivery time is a continuous target and often roughly related to linear combinations of features (distance, prep time, traffic), so linear regression with an MSE loss is a natural, interpretable baseline.

The MSE objective is convex and differentiable, so gradient descent reliably finds the global minimum and scales to larger datasets or many features where the normal equation is expensive or unstable.

Gradient descent easily integrates feature standardisation and regularisation (L2/L1), improving convergence and preventing overfitting while keeping model coefficients interpretable.

3.2. Implement the Linear Regression algorithm using Gradient Descent on the Uber Eats dataset to predict delivery time.

Your implementation should include the following steps:

Data cleaning (e.g., handling missing values)
(3 marks)

Encoding categorical variables
(4 marks)

Feature selection
(5 marks)

Feature standardisation
(3 marks)

Splitting data
(2 marks)

Training model using Gradient Descent
(5 marks)

Evaluating model performance using Mean Squared Error, mean absolute error, root mean squared error, and R2
(5 marks)

In [3]:
#Question 3.2
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# load
df = pd.read_csv("uber_eats.csv")

# Data cleaning
# drop id
if "Order_ID" in df.columns:
    df = df.drop(columns=["Order_ID"])

# fill numeric missing with median, categorical with mode
num_cols = df.select_dtypes(include=[np.number]).columns.tolist()
cat_cols = df.select_dtypes(include=["object"]).columns.tolist()
for c in num_cols:
    df[c] = df[c].fillna(df[c].median())
for c in cat_cols:
    df[c] = df[c].fillna(df[c].mode().iloc[0])

# Encoding categorical variables
X = pd.get_dummies(df.drop(columns=["Delivery_Time_min"]), drop_first=True)
y = df["Delivery_Time_min"].values

# Feature selection 
# compute correlations 
corrs = pd.Series({col: abs(np.corrcoef(X[col].values, y)[0,1]) for col in X.columns})
k = min(6, X.shape[1])
selected = corrs.sort_values(ascending=False).head(k).index.tolist()
X = X[selected]
print("Selected features:", selected)

# Feature standardisation
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)

#  Training model using Gradient Descent
def train_gd(X, y, lr=0.01, epochs=2000):
    n, m = X.shape
    # add bias column
    Xb = np.hstack([np.ones((n,1)), X])
    w = np.zeros(m+1)
    for epoch in range(epochs):
        preds = Xb.dot(w)
        grad = (2/n) * Xb.T.dot(preds - y)
        w -= lr * grad
    return w

w = train_gd(X_train, y_train, lr=0.01, epochs=3000)

# prediction helper
def predict(w, X):
    Xb = np.hstack([np.ones((X.shape[0],1)), X])
    return Xb.dot(w)

y_pred = predict(w, X_test)

# Evaluation
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"MSE={mse:.2f}  MAE={mae:.2f}  RMSE={rmse:.2f}  R2={r2:.3f}")

Selected features: ['Distance_km', 'Preparation_Time_min', 'Weather_Snowy', 'Traffic_Level_Low', 'Courier_Experience_yrs', 'Weather_Rainy']
MSE=98.62  MAE=6.74  RMSE=9.93  R2=0.780
