<h1> Gradient Boosting for predicting delivery time </h1>

<h2> TABLE OF CONTENTS </h2>
<div class ="alert alert-block alert-info" style="margin-top: 20px">
  <ul>
    <li><a href = "#preprocessing"> Preprocessing </a></li>
    <li><a href = "#data"> Build Data </a></li>
    <li>
      <a href = "#model"> Build Model </a>
      <ul>
        <li><a href="#gb">Gradient Boosting</a></li>
        <li><a href="#assess">Assessing model by R2, MAE, MSE</a></li>
      </ul>
    </li>
    <li><a href = "#rank"> Future Ranking </a></li>
  </ul>
</div>

<h2 id = "preprocessing"> Preprocessing </h3>

In [12]:
import pandas as pd

In [13]:
# doc du lieu
data_train = pd.read_csv("train.csv")

# bo cac dong co gia tri null
data_train.dropna(axis=0, inplace = True)

# bo cot ID
data_train.drop('ID',axis=1, inplace=True)

In [14]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

is_Category = data_train.dtypes == object

category_column_list = data_train.columns[is_Category].tolist()

data_train[category_column_list] = data_train[category_column_list].apply(lambda col: le.fit_transform(col))

In [15]:
data_train.head(3)

Unnamed: 0,Delivery_person_ID,Delivery_person_Age,Delivery_person_Ratings,Restaurant_latitude,Restaurant_longitude,Delivery_location_latitude,Delivery_location_longitude,Distance,Order_Date,Time_Orderd,Time_Order_picked,Weather,Road_traffic_density,Vehicle_condition,Type_of_order,Type_of_vehicle,multiple_deliveries,Festival,City,Time_taken
1,1299,27.0,21,305,43,2378,513,169,39,139,153,5,2,2,0,1,1.0,0,2,14
2,14,34.0,24,442,255,3940,2929,807,26,148,164,0,2,1,0,1,1.0,0,0,27
3,265,23.0,25,356,228,3010,2607,2055,30,96,106,2,3,2,3,1,1.0,0,0,21


In [16]:
print(len(data_train.index))

41368


<h2 id="data"> Build data </h2>

In [17]:
import numpy as np

# Chia tap du lieu thanh 2 phan: cac thuoc tinh va ket qua (thoi gian can du doan)
X = np.array(data_train.loc[:, data_train.columns != 'Time_taken'].values, dtype=np.float64)
y = np.array(data_train['Time_taken'].values, dtype=np.float64)

In [18]:
print(X)
print(y)

[[1.299e+03 2.700e+01 2.100e+01 ... 1.000e+00 0.000e+00 2.000e+00]
 [1.400e+01 3.400e+01 2.400e+01 ... 1.000e+00 0.000e+00 0.000e+00]
 [2.650e+02 2.300e+01 2.500e+01 ... 1.000e+00 0.000e+00 0.000e+00]
 ...
 [9.890e+02 3.400e+01 1.900e+01 ... 2.000e+00 0.000e+00 2.000e+00]
 [6.510e+02 3.200e+01 1.500e+01 ... 1.000e+00 0.000e+00 0.000e+00]
 [6.430e+02 2.900e+01 2.300e+01 ... 1.000e+00 0.000e+00 0.000e+00]]
[14. 27. 21. ... 39. 24. 31.]


In [19]:
from sklearn.model_selection import train_test_split

# chia du lieu thanh train va test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=36)

<h2 id="model"> Build model </h2>

In [36]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

<h3 id="gb">Gradient Boosting</h3>

In [21]:
# du doan y bang thua toan Gradient Boosting
gbm = GradientBoostingRegressor(random_state=0)
gbm.fit(X_train, y_train)


0.7792687885161625


<h3 id="assess">Assessing model by R2, MAE, MSE</h3>

In [38]:
# danh gia ket qua thuat toan gradient boosting
r2 = r2_score(y_test,gbm.predict(X_test))
print(r2)

0.7792687885161625


In [39]:
# đánh giá kết quả thuật toán bằng MSE
mse = mean_squared_error(y_test,gbm.predict(X_test))
print(mse)

19.184329320330974


In [40]:
# đánh giá kết quả thuật toán bằng MAE
mae = mean_absolute_error(y_test,gbm.predict(X_test))
print(mae)

3.5070158110534586


<h2 id = "rank"> Future Ranking </h2>

In [31]:
import numpy as np
importances = gbm.feature_importances_

# Sort feature importances in descending order
indices = np.argsort(importances)[::-1]
Features = data_train.columns

In [32]:
print("Feature ranking:")
for i, index in enumerate(indices):
    print(f"{i+1}. {Features[index]}: Importance {importances[index]}")

Feature ranking:
1. Delivery_person_Ratings: Importance 0.23266912554408356
2. Weather: Importance 0.1350546442130052
3. Road_traffic_density: Importance 0.12095475108013512
4. multiple_deliveries: Importance 0.11872697162324522
5. Distance: Importance 0.09898127577322319
6. Vehicle_condition: Importance 0.09409328083950434
7. Delivery_person_Age: Importance 0.09353819525199936
8. Time_Orderd: Importance 0.05380130334670909
9. Festival: Importance 0.029059198568384702
10. Time_Order_picked: Importance 0.011968917805444453
11. City: Importance 0.010937585440344831
12. Restaurant_latitude: Importance 7.949627187411405e-05
13. Restaurant_longitude: Importance 4.646301100470623e-05
14. Delivery_location_latitude: Importance 3.080453367434108e-05
15. Type_of_vehicle: Importance 2.9504375015293217e-05
16. Delivery_person_ID: Importance 2.8482322352468326e-05
17. Order_Date: Importance 0.0
18. Delivery_location_longitude: Importance 0.0
19. Type_of_order: Importance 0.0
