### Context

The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.

### Objective

The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.

### Data Description

The data contains the different data related to a food order. The detailed data dictionary is given below.

### Data Dictionary

* order_id: Unique ID of the order
* customer_id: ID of the customer who ordered the food
* restaurant_name: Name of the restaurant
* cuisine_type: Cuisine ordered by the customer
* cost_of_the_order: Cost of the order
* day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
* rating: Rating given by the customer out of 5
* food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
* delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person's pick-up confirmation and drop-off information

In [49]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
fh_original = pd.read_csv("foodhub_order.csv")
#print(f"{fh_original.head(5)})")
fh_original.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   order_id               1898 non-null   int64  
 1   customer_id            1898 non-null   int64  
 2   restaurant_name        1898 non-null   object 
 3   cuisine_type           1898 non-null   object 
 4   cost_of_the_order      1898 non-null   float64
 5   day_of_the_week        1898 non-null   object 
 6   rating                 1898 non-null   object 
 7   food_preparation_time  1898 non-null   int64  
 8   delivery_time          1898 non-null   int64  
dtypes: float64(1), int64(4), object(4)
memory usage: 133.6+ KB


In [None]:
# Get the total number of rows in the dataframe

# print(f"Total rows in the dataframe: {fh_original.shape[0]}")
# print('-----------------------------------------------------------------------------')
# print(f"Total columns in the dataframe: {fh_original.shape[1]}")
# print('-----------------------------------------------------------------------------')
# # Get the datatype of the 'order_id' column
# print(f"Datatype of 'order_id': {fh_original['order_id'].dtype}")
# print('-----------------------------------------------------------------------------')
count_null = 0 
count_na = 0
column_null_value_dict = {}  # Initialize the dictionary to store missing values count
column_na_value_dict = {}  # Initialize the dictionary to store missing values count

# Loop through each column in the dataframe and check for null and NA values
for column in fh_original.columns:
    #print(f"Column name: {column} and datatype:  {fh_original[column].dtype}") 
    if fh_original[column].isnull().sum() == 0:
        count_null += 1
        missing_values_count = fh_original[column].isnull().sum()
        column_null_value_dict[column] = int(missing_values_count)
    missing_values_count = 0
    if fh_original[column].isna().sum() == 0:
        count_na += 1
        missing_values_count = fh_original[column].isna().sum()
        column_na_value_dict[column] = int(missing_values_count)


#loop through the dictionary and print the values        
print('----------NUll and NaN values are considered missing values ------------------')
print('----------These are the NULL values in the data------------------------------')      
if column_null_value_dict.items() == 0:
    print('There are no null values in the data')
else:
    for column, missing_values_count in column_null_value_dict.items():
        print(f"{column} has these: {missing_values_count} null values")
print('----------These are the NaN values in the data------------------------------')      
if column_na_value_dict.items() == 0:
    print('There are no NaN values in the data')
else:
    for column, missing_values_count in column_na_value_dict.items():
        print(f"{column} has these: {missing_values_count} NaN values")
print('-----------------------------------------------------------------------------')  


#fh_original['order_id'].isnull().sum() # no null values in order_id

# # Display the first 5 rows of the dataframe
# fh_original.head(5)

# # Check for null values
# fh_original.isnull().sum() # no null

# # Check for NA values
# fh_original.isna().sum() # no na


Question 4
### meet the requiremewnts
stastictical time taken ( min ,max , average ) to prepare food

### additonal learning
# does cost of order have relation with food prep time?
# which cusine takes fastest ? 
# which expensice cusine takes fastest ?
#time taken ( min ,max , average ) to prepare food by cusine type
#time taken ( min ,max , average ) to prepare food by restaurant
#time taken ( min ,max , average ) to prepare food by weekday or weekend


In [56]:
# #time taken to prepare food 
fh_original.head(20)

#fh_original[fh_original.order_id == 0] # check the food preparation time for order_id 1

print(fh_original['food_preparation_time'].describe(include='food_preparation_time').mean())
print(fh_original['food_preparation_time'].describe(include='food_preparation_time').min())
#print(fh_original.describe(include='food_preparation_time').max())


print(f"Food preparation average time in seconds is  {round(fh_original['food_preparation_time'].describe().T.mean(),2)}")
print(f"Food preparation minimum time in seconds is  {round(fh_original.food_preparation_time.describe().T.min(),2)}")
print(f"Food preparation maximum time in seconds is  {round(fh_original.food_preparation_time.describe().max(),2)}")



# # count    1898.000000
# # mean       27.371970
# # std         4.632481
# # min        20.000000
# # 25%        23.000000
# # 50%        27.000000
# # 75%        31.000000
# # max        35.000000




258.2505564088984
4.63248077592887
Food preparation average time in seconds is  258.25
Food preparation minimum time in seconds is  4.63
Food preparation maximum time in seconds is  1898.0
