#Task 2 - Diminos Case Study

## Diminos Store - Delivery Time



### Problem Statement 🍕   
Kanav has started his own Pizza Store by getting the Franchise from the popular Pizza brand Diminos.   
Diminos promises to deliver the pizza order within 31 minutes from the time the order was placed. Otherwise the pizza will be free for the customer.  
In order to increase the revenue and profits Kanav is running the store 24 * 7.
Recently Diminos gave a notice to Kanav that they will be measuring their stores' performance by looking at the metric - which is 95th Percentile on Order Delivery time should be less than 31 mins.  
Kanav is worried that he might lose the franchise if he is not able to meet the metric and wants your help in order to understand his store's performance so that he can take some actions to prevent his business. 

## TASK

Assume that you are a freelance data scientist.   


Help Kanav by analyzing the data and sharing insights to keep his business up and running.


In [41]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Load the data

## Load the required libraries

In [73]:
df = pd.read_csv(r"/content/diminos_data.csv")

## View the data 

In [74]:
df.head()

Unnamed: 0,order_id,order_placed_at,order_delivered_at
0,1523111,2023-03-01 00:00:59,2023-03-01 00:18:07.443132
1,1523112,2023-03-01 00:03:59,2023-03-01 00:19:34.925241
2,1523113,2023-03-01 00:07:22,2023-03-01 00:22:28.291385
3,1523114,2023-03-01 00:07:47,2023-03-01 00:46:19.019399
4,1523115,2023-03-01 00:09:03,2023-03-01 00:25:13.619056


## Information about data

In [75]:
# basic info

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15000 entries, 0 to 14999
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   order_id            15000 non-null  int64 
 1   order_placed_at     15000 non-null  object
 2   order_delivered_at  15000 non-null  object
dtypes: int64(1), object(2)
memory usage: 351.7+ KB


In [76]:
## Describe the data

df.describe()

Unnamed: 0,order_id
count,15000.0
mean,1530610.0
std,4330.271
min,1523111.0
25%,1526861.0
50%,1530610.0
75%,1534360.0
max,1538110.0


In [77]:

df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
order_id,15000.0,1530610.5,4330.271354,1523111.0,1526860.75,1530610.5,1534360.25,1538110.0


## Check columns

In [78]:
df.columns

Index(['order_id', 'order_placed_at', 'order_delivered_at'], dtype='object')

## Duplicated values

In [79]:
# check the duplicates

df.duplicated().sum()

0

In [80]:
df.columns

Index(['order_id', 'order_placed_at', 'order_delivered_at'], dtype='object')

## Unique values in data

In [81]:
df['order_id'].nunique()

15000

In [82]:
df['order_placed_at'].nunique()

14953

In [83]:
df['order_delivered_at'].nunique()

15000

## Find the null values

In [84]:
df.isnull().sum()

order_id              0
order_placed_at       0
order_delivered_at    0
dtype: int64

In [85]:
df.isnull().sum().sum()

0

## Check datatypes

In [86]:
df.dtypes

order_id               int64
order_placed_at       object
order_delivered_at    object
dtype: object

## Check Shape

In [87]:
df.shape

(15000, 3)

In [88]:
df.head(2).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   order_id            2 non-null      int64 
 1   order_placed_at     2 non-null      object
 2   order_delivered_at  2 non-null      object
dtypes: int64(1), object(2)
memory usage: 176.0+ bytes


In [89]:
df.head(3)

Unnamed: 0,order_id,order_placed_at,order_delivered_at
0,1523111,2023-03-01 00:00:59,2023-03-01 00:18:07.443132
1,1523112,2023-03-01 00:03:59,2023-03-01 00:19:34.925241
2,1523113,2023-03-01 00:07:22,2023-03-01 00:22:28.291385


# Report:
datatype is given wrong 

order_placed_at is datetime but showing object so we have to change both columns datetime

In [90]:
import datetime

In [91]:
df['order_placed_at'] = pd.to_datetime(df['order_placed_at'])
df['order_delivered_at'] = pd.to_datetime(df['order_delivered_at'])

In [92]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15000 entries, 0 to 14999
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   order_id            15000 non-null  int64         
 1   order_placed_at     15000 non-null  datetime64[ns]
 2   order_delivered_at  15000 non-null  datetime64[ns]
dtypes: datetime64[ns](2), int64(1)
memory usage: 351.7 KB


## Check First Order and Last Order

In [93]:
print("First Order :", min(df['order_placed_at']))
print("Last Order :", max(df['order_placed_at']))

First Order : 2023-03-01 00:00:59
Last Order : 2023-03-27 23:58:20


## Check Fist Delivery and Last Delivery

In [94]:
print("First Delivery :", min(df['order_delivered_at']))
print("Last Delivery :", max(df['order_delivered_at']))

First Delivery : 2023-03-01 00:18:07.443132
Last Delivery : 2023-03-29 02:42:50.645252


## Substracting time from order deliverey time to order placed time

In [95]:
df['deliverey_time'] = df['order_delivered_at'] -  df['order_placed_at']

In [96]:
df.head()

Unnamed: 0,order_id,order_placed_at,order_delivered_at,deliverey_time
0,1523111,2023-03-01 00:00:59,2023-03-01 00:18:07.443132,0 days 00:17:08.443132
1,1523112,2023-03-01 00:03:59,2023-03-01 00:19:34.925241,0 days 00:15:35.925241
2,1523113,2023-03-01 00:07:22,2023-03-01 00:22:28.291385,0 days 00:15:06.291385
3,1523114,2023-03-01 00:07:47,2023-03-01 00:46:19.019399,0 days 00:38:32.019399
4,1523115,2023-03-01 00:09:03,2023-03-01 00:25:13.619056,0 days 00:16:10.619056


In [97]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15000 entries, 0 to 14999
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype          
---  ------              --------------  -----          
 0   order_id            15000 non-null  int64          
 1   order_placed_at     15000 non-null  datetime64[ns] 
 2   order_delivered_at  15000 non-null  datetime64[ns] 
 3   deliverey_time      15000 non-null  timedelta64[ns]
dtypes: datetime64[ns](2), int64(1), timedelta64[ns](1)
memory usage: 468.9 KB


In [101]:
df["deliverey_time"] = df["deliverey_time"].dt.total_seconds()/60

In [102]:
df.head()

Unnamed: 0,order_id,order_placed_at,order_delivered_at,deliverey_time
0,1523111,2023-03-01 00:00:59,2023-03-01 00:18:07.443132,17.140719
1,1523112,2023-03-01 00:03:59,2023-03-01 00:19:34.925241,15.598754
2,1523113,2023-03-01 00:07:22,2023-03-01 00:22:28.291385,15.104856
3,1523114,2023-03-01 00:07:47,2023-03-01 00:46:19.019399,38.533657
4,1523115,2023-03-01 00:09:03,2023-03-01 00:25:13.619056,16.176984


In [103]:
df['deliverey_time'].quantile(0.95)

27.261043996666658

In [105]:
_95th_percentile = round(df['deliverey_time'].quantile(0.95))

In [106]:
_95th_percentile 

27

In [110]:
if _95th_percentile  > 31:
    print("Kanav is worried that he might lose the franchise if he is not able to meet the metric so that he can take some actions to prevent his business. ")
else:
    print("the store performance is good and the owner no need to worry")

the store performance is good and the owner no need to worry
