# **EDSA - Sendy Logistics Challenge**

by **EXPLORE Data Science Academy**

Linear regression sprint

##   Project overview:

Sendy, in partnership with insight2impact facility, is hosting a Zindi challenge to predict the estimated time of delivery of orders, from the point of driver pickup to the point of arrival at final destination.

The solution will help Sendy enhance customer communication and improve the reliability of its service; which will ultimately improve customer experience. In addition, the solution will enable Sendy to realise cost savings, and ultimately reduce the cost of doing business, through improved resource management and planning for order scheduling.

Data is a critical component in helping Sendy to build more efficient, affordable and accessible solutions. Given the details of a Sendy order, use historic data to predict a time for the arrival of the rider at the destination of a package. Build a model that predicts an accurate delivery time, from picking up a package to arriving at the final destination. An accurate arrival time prediction will help all businesses to improve their logistics and communicate an accurate time to their customers.

##  Data:

The dataset provided by Sendy includes order details and rider metrics based on orders made on the Sendy platform.

*Datasets:*

*   Train.csv - is the dataset that you will use to train your model.
*   Test.csv - is the dataset on which you will apply your model to.
*   Riders.csv - contains unique rider Ids, number of orders, age, rating and   number of ratings.
*   VariableDefinitions.csv - Definitions of variables in the Train, Test and Riders files



###   Variables:

**Order details**
*   Order No – Unique number identifying the order
*   User Id – Unique number identifying the customer on a platform
*   Vehicle Type – For this competition limited to bikes, however in practice, Sendy service extends to trucks and vans
*   Platform Type – Platform used to place the order, there are 4 types
*   Personal or Business – Customer type

**Placement times**
*   Placement - Day of Month i.e 1-31
*   Placement - Weekday (Monday = 1)
*   Placement - Time - Time of day the order was placed

**Confirmation times**
*   Confirmation - Day of Month i.e 1-31
*   Confirmation - Weekday (Monday = 1)
*   Confirmation - Time - time of day the order was confirmed by a rider

**Arrival at Pickup times**
*   Arrival at Pickup - Day of Month i.e 1-31
*   Arrival at Pickup - Weekday (Monday = 1)
*   Arrival at Pickup - Time - Time of day the rider arrived at the location to *   pick up the order - as marked by the rider through the Sendy application

**Pickup times**
*   Pickup - Day of Month i.e 1-31
*   Pickup - Weekday (Monday = 1)
*   Pickup - Time - Time of day the rider picked up the order - as marked by the rider through the Sendy application

**Arrival at Destination times** *(column missing in Test set)*
*   Arrival at Delivery - Day of Month i.e 1-31
*   Arrival at Delivery - Weekday (Monday = 1)
*   Arrival at Delivery - Time - Time of day the rider arrived at the destination to deliver the order - as marked by the rider through the Sendy application
*   Distance covered (KM) - The distance from Pickup to Destination
*   Temperature -Temperature at the time of order placement in Degrees Celsius (measured every three hours)
*   Precipitation in Millimeters - Precipitation at the time of order placement (measured every three hours)
*   Pickup Latitude and Longitude - Latitude and longitude of pick up location
*   Destination Latitude and Longitude - Latitude and longitude of delivery location
*   Rider ID – ID of the Rider who accepted the order
*   Time from Pickup to Arrival - Time in seconds between ‘Pickup’ and ‘Arrival at Destination’ - calculated from the columns for the purpose of facilitating the task

**Rider metrics**
*   Rider ID – Unique number identifying the rider (same as in order details)
*   No of Orders – Number of Orders the rider has delivered
*   Age – Number of days since the rider delivered the first order
*   Average Rating – Average rating of the rider
*   No of Ratings - Number of ratings the rider has received. Rating an order is optional for the customer.

##  Imports:

In [9]:
# import important libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

##  Data Loading:

In [10]:
# load the data from CSV file into pandas DataFrames

train_df = pd.read_csv('https://raw.githubusercontent.com/Kaekaefx/Group5_Gather_Predict/master/database_tables_csv/Data/Train.csv')
test_df = pd.read_csv('https://raw.githubusercontent.com/Kaekaefx/Group5_Gather_Predict/master/database_tables_csv/Data/Test.csv')
riders_df = pd.read_csv('https://raw.githubusercontent.com/0731325603/regression-predict-api-template/master/Data/Riders.csv')
sample_submission_df = pd.read_csv('https://raw.githubusercontent.com/0731325603/regression-predict-api-template/master/Data/SampleSubmission.csv')
variable_def_df = pd.read_csv('https://raw.githubusercontent.com/Kaekaefx/Group5_Gather_Predict/master/database_tables_csv/Data/VariableDefinitions.csv')

In [11]:
train_df.head()

Unnamed: 0,Order No,User Id,Vehicle Type,Platform Type,Personal or Business,Placement - Day of Month,Placement - Weekday (Mo = 1),Placement - Time,Confirmation - Day of Month,Confirmation - Weekday (Mo = 1),...,Arrival at Destination - Time,Distance (KM),Temperature,Precipitation in millimeters,Pickup Lat,Pickup Long,Destination Lat,Destination Long,Rider Id,Time from Pickup to Arrival
0,Order_No_4211,User_Id_633,Bike,3,Business,9,5,9:35:46 AM,9,5,...,10:39:55 AM,4,20.4,,-1.317755,36.83037,-1.300406,36.829741,Rider_Id_432,745
1,Order_No_25375,User_Id_2285,Bike,3,Personal,12,5,11:16:16 AM,12,5,...,12:17:22 PM,16,26.4,,-1.351453,36.899315,-1.295004,36.814358,Rider_Id_856,1993
2,Order_No_1899,User_Id_265,Bike,3,Business,30,2,12:39:25 PM,30,2,...,1:00:38 PM,3,,,-1.308284,36.843419,-1.300921,36.828195,Rider_Id_155,455
3,Order_No_9336,User_Id_1402,Bike,3,Business,15,5,9:25:34 AM,15,5,...,10:05:27 AM,9,19.2,,-1.281301,36.832396,-1.257147,36.795063,Rider_Id_855,1341
4,Order_No_27883,User_Id_1737,Bike,1,Personal,13,1,9:55:18 AM,13,1,...,10:25:37 AM,9,15.4,,-1.266597,36.792118,-1.295041,36.809817,Rider_Id_770,1214


In [12]:
test_df.head()

Unnamed: 0,Order No,User Id,Vehicle Type,Platform Type,Personal or Business,Placement - Day of Month,Placement - Weekday (Mo = 1),Placement - Time,Confirmation - Day of Month,Confirmation - Weekday (Mo = 1),...,Pickup - Weekday (Mo = 1),Pickup - Time,Distance (KM),Temperature,Precipitation in millimeters,Pickup Lat,Pickup Long,Destination Lat,Destination Long,Rider Id
0,Order_No_19248,User_Id_3355,Bike,3,Business,27,3,4:44:10 PM,27,3,...,3,5:06:47 PM,8,,,-1.333275,36.870815,-1.305249,36.82239,Rider_Id_192
1,Order_No_12736,User_Id_3647,Bike,3,Business,17,5,12:57:35 PM,17,5,...,5,1:25:37 PM,5,,,-1.272639,36.794723,-1.277007,36.823907,Rider_Id_868
2,Order_No_768,User_Id_2154,Bike,3,Business,27,4,11:08:14 AM,27,4,...,4,11:57:54 AM,5,22.8,,-1.290894,36.822971,-1.276574,36.851365,Rider_Id_26
3,Order_No_15332,User_Id_2910,Bike,3,Business,17,1,1:51:35 PM,17,1,...,1,2:16:52 PM,5,24.5,,-1.290503,36.809646,-1.303382,36.790658,Rider_Id_685
4,Order_No_21373,User_Id_1205,Bike,3,Business,11,2,11:30:28 AM,11,2,...,2,11:56:04 AM,6,24.4,,-1.281081,36.814423,-1.266467,36.792161,Rider_Id_858


In [13]:
riders_df.head()

Unnamed: 0,Rider Id,No_Of_Orders,Age,Average_Rating,No_of_Ratings
0,Rider_Id_396,2946,2298,14.0,1159
1,Rider_Id_479,360,951,13.5,176
2,Rider_Id_648,1746,821,14.3,466
3,Rider_Id_753,314,980,12.5,75
4,Rider_Id_335,536,1113,13.7,156


In [14]:
sample_submission_df.head()

Unnamed: 0,Order_No,Time from Pickup to Arrival
0,Order_No_19248,567.0
1,Order_No_12736,4903.0
2,Order_No_768,5649.0
3,Order_No_15332,
4,Order_No_21373,


In [15]:
variable_def_df.head()

Unnamed: 0,Order No,Unique number identifying the order
0,User Id,Unique number identifying the customer on a pl...
1,Vehicle Type,"For this competition limited to bikes, however..."
2,Platform Type,"Platform used to place the order, there are 4 ..."
3,Personal or Business,Customer type
4,Placement - Day of Month,Placement - Day of Month i.e 1-31


##   Data Preprocessing:

In [16]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21201 entries, 0 to 21200
Data columns (total 29 columns):
Order No                                     21201 non-null object
User Id                                      21201 non-null object
Vehicle Type                                 21201 non-null object
Platform Type                                21201 non-null int64
Personal or Business                         21201 non-null object
Placement - Day of Month                     21201 non-null int64
Placement - Weekday (Mo = 1)                 21201 non-null int64
Placement - Time                             21201 non-null object
Confirmation - Day of Month                  21201 non-null int64
Confirmation - Weekday (Mo = 1)              21201 non-null int64
Confirmation - Time                          21201 non-null object
Arrival at Pickup - Day of Month             21201 non-null int64
Arrival at Pickup - Weekday (Mo = 1)         21201 non-null int64
Arrival at Pickup - Time   

In [17]:
test_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7068 entries, 0 to 7067
Data columns (total 25 columns):
Order No                                7068 non-null object
User Id                                 7068 non-null object
Vehicle Type                            7068 non-null object
Platform Type                           7068 non-null int64
Personal or Business                    7068 non-null object
Placement - Day of Month                7068 non-null int64
Placement - Weekday (Mo = 1)            7068 non-null int64
Placement - Time                        7068 non-null object
Confirmation - Day of Month             7068 non-null int64
Confirmation - Weekday (Mo = 1)         7068 non-null int64
Confirmation - Time                     7068 non-null object
Arrival at Pickup - Day of Month        7068 non-null int64
Arrival at Pickup - Weekday (Mo = 1)    7068 non-null int64
Arrival at Pickup - Time                7068 non-null object
Pickup - Day of Month                   7068 n

In [18]:
riders_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 960 entries, 0 to 959
Data columns (total 5 columns):
Rider Id          960 non-null object
No_Of_Orders      960 non-null int64
Age               960 non-null int64
Average_Rating    960 non-null float64
No_of_Ratings     960 non-null int64
dtypes: float64(1), int64(3), object(1)
memory usage: 37.6+ KB


In [19]:
sample_submission_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7068 entries, 0 to 7067
Data columns (total 2 columns):
Order_No                       7068 non-null object
Time from Pickup to Arrival    3 non-null float64
dtypes: float64(1), object(1)
memory usage: 110.6+ KB


In [21]:
variable_def_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 2 columns):
Order No                               34 non-null object
Unique number identifying the order    33 non-null object
dtypes: object(2)
memory usage: 688.0+ bytes


###   Exploratory Data Analysis