### Kaggle Food Delivery Dataset
https://www.kaggle.com/datasets/gauravmalik26/food-delivery-dataset?resource=download

---
#### Q. 운영팀이 필요한 라이더 수를 어떻게 산정하였는지 문의 
- 일일 배달건수 : 1 ~ 2주차 800건, 4 ~ 9주차 1000건 예상 
- 라이더 활동율 : 95%, 1인당 1 ~ 1.5건 수행 예상 
- 운영팀 예상대로 실제 라이더가 활동하였는지 파악해보자 
---

In [1]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

In [2]:
df = pd.read_pickle('data/df.pkl')
print(df.shape)
df.head(2)

(41368, 25)


Unnamed: 0,ID,Delivery_person_ID,Delivery_person_Age,Delivery_person_Ratings,Restaurant_latitude,Restaurant_longitude,Delivery_location_latitude,Delivery_location_longitude,Order_Date,Weatherconditions,...,Festival,City,Time_taken(min),order_time,pick_time,orpi_time_diff_m,distance_km,speed_km/h,dayoftheweek,city_code
0,0x4607,INDORES13DEL02,37.0,4.9,22.745049,75.892471,22.765049,75.912471,2022-03-19,Sunny,...,No,Urban,24.0,2022-03-19 11:30:00,2022-03-19 11:45:00,15.0,3.0,7.5,SAT,INDO
1,0xb379,BANGRES18DEL02,34.0,4.5,12.913041,77.683237,13.043041,77.813237,2022-03-25,Stormy,...,No,Metropolitian,33.0,2022-03-25 19:45:00,2022-03-25 19:50:00,5.0,20.2,36.7,FRI,BANG


In [3]:
## today_orders 프레임 생성
today_orders = pd.DataFrame(columns = {'day', 'day_order', 'day_rider', 'delivery_mean'})
today_orders = today_orders[['day', 'day_order', 'day_rider', 'delivery_mean']]
day_list = list(df['Order_Date'].drop_duplicates())

s = 0 
for i in day_list :
    today_orders.loc[s] = [i, len(df[df['Order_Date']== i]), 
                              len(df[df['Order_Date']== i].value_counts('Delivery_person_ID')), 
                              len(df[df['Order_Date']== i]) / len(df[df['Order_Date']== i].value_counts('Delivery_person_ID'))]
    s = s+1

today_orders['weekofyear'] = today_orders['day'].dt.weekofyear - 5
today_orders.head()

Unnamed: 0,day,day_order,day_rider,delivery_mean,weekofyear
0,2022-03-19,1036,666,1.555556,6
1,2022-03-25,888,614,1.446254,7
2,2022-04-05,1041,663,1.570136,9
3,2022-03-26,1044,672,1.553571,7
4,2022-03-11,1020,650,1.569231,5


#### today_orders에 다음 변수들 추가  
- contract_delivery : 계약 라이더 수
- absence : 당일 활동 하지 않는 라이더 비율 (당일 라이더 활동 수 / 계약 라이더 수)
- delivery_mean : 금일 라이더 당 평균 배달(금일 배달 건수 / 금일 활동 라이더 수)  
- 100_delivery_mean : 계약 라이더가 100% 활동했을 때 평균 배달(금일 배달 건수 / 계약 라이더 수 )
- 95_delivery_mean : 계약 라이더가 95% 활동했을 때 평균 배달(금일 배달 건수 / 계약 라이더 수 )

In [4]:
## 운영팀 계약장부를 기준으로 일자별 계약 라이더수 생성 
today_orders['contact_delivery'] = ''

today_orders.loc[today_orders['day'] <= '2022-02-13', 'contract_delivery'] = 599 
today_orders.loc[(today_orders['day'] > '2022-02-13') &(today_orders['day'] < '2022-02-19'),'contract_delivery' ] =600
today_orders.loc[(today_orders['day'] > '2022-02-19') &(today_orders['day'] < '2022-04-04'),'contract_delivery' ] =720
today_orders.loc[today_orders['day'] > '2022-04-03','contract_delivery'] = 719

In [5]:
## 계약 라이더수와 실제 당일 활동한 라이더수로 부터, 결석률 계산 
today_orders['absence'] = (1- (today_orders.day_rider / today_orders.contract_delivery)) *100

In [6]:
## 계약된 라이더가 100%, 95% 활동했을때의 평균 배달건수는? 
today_orders['100_delivery_mean'] = today_orders['day_order'] / today_orders['contract_delivery']
today_orders['95_delivery_mean'] = today_orders['day_order'] / (today_orders['contract_delivery'] * 0.95).astype('float').astype('int')

In [8]:
today_orders = today_orders[['weekofyear', 'day', 'day_order', 'day_rider', 'contract_delivery', 'absence', 'delivery_mean', 
                             '100_delivery_mean','95_delivery_mean']]
print(today_orders.shape)
today_orders.head()

(44, 9)


Unnamed: 0,weekofyear,day,day_order,day_rider,contract_delivery,absence,delivery_mean,100_delivery_mean,95_delivery_mean
0,6,2022-03-19,1036,666,720.0,7.5,1.555556,1.438889,1.51462
1,7,2022-03-25,888,614,720.0,14.722222,1.446254,1.233333,1.298246
2,9,2022-04-05,1041,663,719.0,7.788595,1.570136,1.447844,1.524158
3,7,2022-03-26,1044,672,720.0,6.666667,1.553571,1.45,1.526316
4,5,2022-03-11,1020,650,720.0,9.722222,1.569231,1.416667,1.491228


#### 주차별 라이더당 평균 배달건수 및 결석률 확인을 위해 다음 변수들 추가 
- week_absence : 주차 별 라이더 결석률
- 100_week_mean : 주차 별 라이더 100%가 활동했을 때 평균 배달
- 95_week_mean : 주차 별 라이더 95%가 활동했을 때 평균 배달
- week_mean : 주차별 라이더 평균 배달

In [16]:
week_comparison = pd.DataFrame(columns ={'week','week_mean','100_week_mean','95_week_mean','week_absence'})
week_comparison = week_comparison[['week','week_mean','100_week_mean','95_week_mean','week_absence']]
s=0

for i in today_orders['weekofyear'].unique(): 
    week_comparison.loc[s] = [i, round(today_orders[today_orders['weekofyear'] ==i]['delivery_mean'].mean(),2),
                         round(today_orders[today_orders['weekofyear'] ==i]['delivery_mean2'].mean(),2),
                       round(today_orders[today_orders['weekofyear'] ==i]['delivery_mean3'].mean(),2),
                       round(today_orders[today_orders['weekofyear'] ==i]['absence'].mean(),2) ]
    s= s+1

## delivery_mean2, delivery_mean3의 정체파악 못하여 중단함 ... 

KeyError: 'delivery_mean2'