# Amazon Delivery Time Prediction EDA

## Importing libraries

In [22]:
import pandas as pd # for Data Manipulation
import numpy as np # for Numerical Computation
from ydata_profiling import ProfileReport # for profiling and EDA
import matplotlib.pyplot as plt # for Visualization
from sklearn.preprocessing import LabelEncoder # for Encoding Categorical Variables
from datetime import datetime # for handling date and time

## Reading Data

In [75]:
df = pd.read_csv('amazon_delivery.csv')
df.head()

Unnamed: 0,Order_ID,Agent_Age,Agent_Rating,Store_Latitude,Store_Longitude,Drop_Latitude,Drop_Longitude,Order_Date,Order_Time,Pickup_Time,Weather,Traffic,Vehicle,Area,Delivery_Time,Category
0,ialx566343618,37,4.9,22.745049,75.892471,22.765049,75.912471,2022-03-19,11:30:00,11:45:00,Sunny,High,motorcycle,Urban,120,Clothing
1,akqg208421122,34,4.5,12.913041,77.683237,13.043041,77.813237,2022-03-25,19:45:00,19:50:00,Stormy,Jam,scooter,Metropolitian,165,Electronics
2,njpu434582536,23,4.4,12.914264,77.6784,12.924264,77.6884,2022-03-19,08:30:00,08:45:00,Sandstorms,Low,motorcycle,Urban,130,Sports
3,rjto796129700,38,4.7,11.003669,76.976494,11.053669,77.026494,2022-04-05,18:00:00,18:10:00,Sunny,Medium,motorcycle,Metropolitian,105,Cosmetics
4,zguw716275638,32,4.6,12.972793,80.249982,13.012793,80.289982,2022-03-26,13:30:00,13:45:00,Cloudy,High,scooter,Metropolitian,150,Toys


### Variables

* **Order_ID**: Unique identifier for each order.
* **Agent_Age**: Age of the delivery agent.
* **Agent_Rating**: Rating of the delivery agent.
* **Store_Latitude/Longitude**: Geographic location of the store.
* **Drop_Latitude/Longitude**: Geographic location of the delivery address.
* **Order_Date/Order_Time**: Date and time when the order was placed.
* **Pickup_Time**: Time when the delivery agent picked up the order.
* **Weather**: Weather conditions during delivery.
* **Traffic**: Traffic conditions during delivery.
* **Vehicle**: Mode of transportation used for delivery.
* **Area**: Type of delivery area (Urban/Metropolitan).
* **Delivery_Time**: Target variable representing the actual time taken for delivery (in hours).
* **Category**: Category of the product being delivered.


## Profile / EDA

In [6]:
profile = ProfileReport(df, title="Amazon Delivery EDA Report", explorative=True)
profile.to_notebook_iframe()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 16/16 [00:00<00:00, 20.28it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

### Conclusion/Report

* `Agent_Rating` is highly overall correlated with `Traffic`	
* `Drop_Latitude` is highly overall correlated with `Store_Latitude`	
* `Drop_Longitude` is highly overall correlated with `Store_Longitude`	

* `Area` is highly imbalanced (51.9%)	
* `Order_ID` has unique values	
* `Store_Latitude` has 3505 (8.0%) zeros	
* `Store_Longitude` has 3505 (8.0%) zeros

## Data Preprocessing

In [76]:
# lowercasing column names
df.columns = [col.lower() for col in df.columns]

In [77]:
# Handling Missing Values
df['agent_rating'] = df['agent_rating'].fillna(df['agent_rating'].mean())
df['weather'] = df['weather'].fillna(df['weather'].mode()[0])

In [78]:
# converting to datetime
df['order_date'] = pd.to_datetime(df['order_date'])
df['order_time'] = df['order_time'].replace('NaN ','00:00:00') # it contains some 'NaN ' strings
df['order_time'] = pd.to_datetime(df['order_time'],format='mixed').dt.time
df['pickup_time'] = pd.to_datetime(df['pickup_time'],format='mixed').dt.time

In [79]:
df.head()

Unnamed: 0,order_id,agent_age,agent_rating,store_latitude,store_longitude,drop_latitude,drop_longitude,order_date,order_time,pickup_time,weather,traffic,vehicle,area,delivery_time,category
0,ialx566343618,37,4.9,22.745049,75.892471,22.765049,75.912471,2022-03-19,11:30:00,11:45:00,Sunny,High,motorcycle,Urban,120,Clothing
1,akqg208421122,34,4.5,12.913041,77.683237,13.043041,77.813237,2022-03-25,19:45:00,19:50:00,Stormy,Jam,scooter,Metropolitian,165,Electronics
2,njpu434582536,23,4.4,12.914264,77.6784,12.924264,77.6884,2022-03-19,08:30:00,08:45:00,Sandstorms,Low,motorcycle,Urban,130,Sports
3,rjto796129700,38,4.7,11.003669,76.976494,11.053669,77.026494,2022-04-05,18:00:00,18:10:00,Sunny,Medium,motorcycle,Metropolitian,105,Cosmetics
4,zguw716275638,32,4.6,12.972793,80.249982,13.012793,80.289982,2022-03-26,13:30:00,13:45:00,Cloudy,High,scooter,Metropolitian,150,Toys


In [None]:
# encoding categorical variables
categorical_cols = df.select_dtypes(include=['object']).columns
le = LabelEncoder()