# Doordash Delivery Analysis

Task:
1. Create an Unique Key for each delivery (Index column)
2. Fix 'created_at' & 'actual_delivery_time' data types
3. Parse Date/Time from created_at
4. Parse Date/Time from actual_delivery_time
5. Create delivery_time: actual_time - created_time
6. Reformat dollar amounts: columns=['subtotal', 'min_item_price', 'max_item_price', ]
6. Visualize average delivery time for each market
7. Visualize average delivery time for each order_protocol

Dashboard Ideas:  
Store Performance (Order_protocol performance, total sales, average delivery time, slowest delivery time, fastest delivery time)
Market Performance (""""")
Protocol Performance (""""")
Estimated Time vs Actual Delivery Time analysis

Questions: (All questions asking about all time totals & Possible Trends over time)
1. Which market has the most/least deliveries?
    - Which stores/store categories get the most/least deliveries?
2. Which market has the quickest/slowest deliviries on average?
2. Which order protocol has the most/least deliveries?
3. Which order protocol has the quickest/slowest deliveries on average?
4. What does the average delivery time look like for each market look like daily/weekly/monthly/quarterly/annually
5. What does the average delivery time look like for each order_protocol look like daily/weekly/monthly/quartly/annually

## Data Ingestion

In [23]:
import pandas as pd

In [24]:
data = pd.read_csv("delivery_data.csv")
data.head()

Unnamed: 0,market_id,created_at,actual_delivery_time,store_id,store_primary_category,order_protocol,total_items,subtotal,num_distinct_items,min_item_price,max_item_price,total_onshift_dashers,total_busy_dashers,total_outstanding_orders,estimated_order_place_duration,estimated_store_to_consumer_driving_duration
0,1.0,2015-02-06 22:24:17,2015-02-06 23:27:16,1845,american,1.0,4,3441,4,557,1239,33.0,14.0,21.0,446,861.0
1,2.0,2015-02-10 21:49:25,2015-02-10 22:56:29,5477,mexican,2.0,1,1900,1,1400,1400,1.0,2.0,2.0,446,690.0
2,3.0,2015-01-22 20:39:28,2015-01-22 21:09:09,5477,,1.0,1,1900,1,1900,1900,1.0,0.0,0.0,446,690.0
3,3.0,2015-02-03 21:21:45,2015-02-03 22:13:00,5477,,1.0,6,6900,5,600,1800,1.0,1.0,2.0,446,289.0
4,3.0,2015-02-15 02:40:36,2015-02-15 03:20:26,5477,,1.0,3,3900,3,1100,1600,6.0,6.0,9.0,446,650.0


In [25]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 197428 entries, 0 to 197427
Data columns (total 16 columns):
 #   Column                                        Non-Null Count   Dtype  
---  ------                                        --------------   -----  
 0   market_id                                     196441 non-null  float64
 1   created_at                                    197428 non-null  object 
 2   actual_delivery_time                          197421 non-null  object 
 3   store_id                                      197428 non-null  int64  
 4   store_primary_category                        192668 non-null  object 
 5   order_protocol                                196433 non-null  float64
 6   total_items                                   197428 non-null  int64  
 7   subtotal                                      197428 non-null  int64  
 8   num_distinct_items                            197428 non-null  int64  
 9   min_item_price                                19

### Create An Index Column

In [26]:
data['delivery_id'] = data.index
data = data[['delivery_id', 'market_id', 'created_at', 'actual_delivery_time', 'store_id',
       'store_primary_category', 'order_protocol', 'total_items', 'subtotal',
       'num_distinct_items', 'min_item_price', 'max_item_price',
       'total_onshift_dashers', 'total_busy_dashers',
       'total_outstanding_orders', 'estimated_order_place_duration',
       'estimated_store_to_consumer_driving_duration']].copy()
data

Unnamed: 0,delivery_id,market_id,created_at,actual_delivery_time,store_id,store_primary_category,order_protocol,total_items,subtotal,num_distinct_items,min_item_price,max_item_price,total_onshift_dashers,total_busy_dashers,total_outstanding_orders,estimated_order_place_duration,estimated_store_to_consumer_driving_duration
0,0,1.0,2015-02-06 22:24:17,2015-02-06 23:27:16,1845,american,1.0,4,3441,4,557,1239,33.0,14.0,21.0,446,861.0
1,1,2.0,2015-02-10 21:49:25,2015-02-10 22:56:29,5477,mexican,2.0,1,1900,1,1400,1400,1.0,2.0,2.0,446,690.0
2,2,3.0,2015-01-22 20:39:28,2015-01-22 21:09:09,5477,,1.0,1,1900,1,1900,1900,1.0,0.0,0.0,446,690.0
3,3,3.0,2015-02-03 21:21:45,2015-02-03 22:13:00,5477,,1.0,6,6900,5,600,1800,1.0,1.0,2.0,446,289.0
4,4,3.0,2015-02-15 02:40:36,2015-02-15 03:20:26,5477,,1.0,3,3900,3,1100,1600,6.0,6.0,9.0,446,650.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197423,197423,1.0,2015-02-17 00:19:41,2015-02-17 01:24:48,2956,fast,4.0,3,1389,3,345,649,17.0,17.0,23.0,251,331.0
197424,197424,1.0,2015-02-13 00:01:59,2015-02-13 00:58:22,2956,fast,4.0,6,3010,4,405,825,12.0,11.0,14.0,251,915.0
197425,197425,1.0,2015-01-24 04:46:08,2015-01-24 05:36:16,2956,fast,4.0,5,1836,3,300,399,39.0,41.0,40.0,251,795.0
197426,197426,1.0,2015-02-01 18:18:15,2015-02-01 19:23:22,3630,sandwich,1.0,1,1175,1,535,535,7.0,7.0,12.0,446,384.0


In [27]:
data.columns

Index(['delivery_id', 'market_id', 'created_at', 'actual_delivery_time',
       'store_id', 'store_primary_category', 'order_protocol', 'total_items',
       'subtotal', 'num_distinct_items', 'min_item_price', 'max_item_price',
       'total_onshift_dashers', 'total_busy_dashers',
       'total_outstanding_orders', 'estimated_order_place_duration',
       'estimated_store_to_consumer_driving_duration'],
      dtype='object')

### Parse Dates & Times from Datetime columns

In [28]:
data['created_at'] = pd.to_datetime(data['created_at'])
data['actual_delivery_time'] = pd.to_datetime(data['actual_delivery_time'])


data['created_at_date'] = data['created_at'].dt.date
data['created_at_time'] = data['created_at'].dt.time

data['actual_delivery_date'] = data['actual_delivery_time'].dt.date
data['actual_delivery_time'] = data['actual_delivery_time'].dt.time

data = data[['delivery_id', 'market_id', 'created_at_date', 'created_at_time', 'actual_delivery_date', 'actual_delivery_time', 
             'store_id', 'store_primary_category', 'order_protocol', 'total_items', 'subtotal', 'num_distinct_items', 
             'min_item_price', 'max_item_price', 'total_onshift_dashers', 'total_busy_dashers', 'total_outstanding_orders', 
             'estimated_order_place_duration', 'estimated_store_to_consumer_driving_duration']].copy()

data

Unnamed: 0,delivery_id,market_id,created_at_date,created_at_time,actual_delivery_date,actual_delivery_time,store_id,store_primary_category,order_protocol,total_items,subtotal,num_distinct_items,min_item_price,max_item_price,total_onshift_dashers,total_busy_dashers,total_outstanding_orders,estimated_order_place_duration,estimated_store_to_consumer_driving_duration
0,0,1.0,2015-02-06,22:24:17,2015-02-06,23:27:16,1845,american,1.0,4,3441,4,557,1239,33.0,14.0,21.0,446,861.0
1,1,2.0,2015-02-10,21:49:25,2015-02-10,22:56:29,5477,mexican,2.0,1,1900,1,1400,1400,1.0,2.0,2.0,446,690.0
2,2,3.0,2015-01-22,20:39:28,2015-01-22,21:09:09,5477,,1.0,1,1900,1,1900,1900,1.0,0.0,0.0,446,690.0
3,3,3.0,2015-02-03,21:21:45,2015-02-03,22:13:00,5477,,1.0,6,6900,5,600,1800,1.0,1.0,2.0,446,289.0
4,4,3.0,2015-02-15,02:40:36,2015-02-15,03:20:26,5477,,1.0,3,3900,3,1100,1600,6.0,6.0,9.0,446,650.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197423,197423,1.0,2015-02-17,00:19:41,2015-02-17,01:24:48,2956,fast,4.0,3,1389,3,345,649,17.0,17.0,23.0,251,331.0
197424,197424,1.0,2015-02-13,00:01:59,2015-02-13,00:58:22,2956,fast,4.0,6,3010,4,405,825,12.0,11.0,14.0,251,915.0
197425,197425,1.0,2015-01-24,04:46:08,2015-01-24,05:36:16,2956,fast,4.0,5,1836,3,300,399,39.0,41.0,40.0,251,795.0
197426,197426,1.0,2015-02-01,18:18:15,2015-02-01,19:23:22,3630,sandwich,1.0,1,1175,1,535,535,7.0,7.0,12.0,446,384.0


## Data Cleaning

## Exploratory Data Analysis

## Visualizations

## Model Building

## Conclusion