## FoodHub Data Analysis

### Context

The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.

### Objective

The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.

### Data Description

The data contains the different data related to a food order. The detailed data dictionary is given below.

### Data Dictionary

* order_id: Unique ID of the order
* customer_id: ID of the customer who ordered the food
* restaurant_name: Name of the restaurant
* cuisine_type: Cuisine ordered by the customer
* cost: Cost of the order
* day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
* rating: Rating given by the customer out of 5
* food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
* delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person's pick-up confirmation and drop-off information

### Let us start by importing the required libraries

In [6]:

# import libraries for data manipulation
import numpy as np
import pandas as pd

# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

#supress numerical display in scientific notations
pd.set_option('display.float_format', lambda x: '%.2f' % x)

#display all cols of df
pd.set_option('display.max_columns', None)


### Understanding the structure of the data

In [7]:
#read the data
df=pd.read_csv('foodhub_order.csv')
#make a copy to easily restore later if necessary
df_foodhub=df.copy()

FileNotFoundError: [Errno 2] No such file or directory: 'foodhub_order.csv'

## **Data Overview**

In [None]:
#first we look at the first few rows of data


#### **Observations:**
- The dataset consists of numerical and categorical values.
- The rating column consists of both string values. We will deal with this issue later on.

### **Question 1:** How many rows and columns are present in the data?



In [5]:
#prints the number of rows and columns

print(f'The dataset contains {df_foodhub.shape[0]} rows and {df_foodhub.shape[1]} columns')


NameError: name 'df_foodhub' is not defined

#### **Observations:**

The dataset has 1898 rows and 9 columns


### **Question 2:** What are the datatypes of the different columns in the dataset? (The info() function can be used)

In [None]:
#returns the data types of the columns
df_foodhub.info()

#### **Observations:**
The dataset contains:

- 4 columns with the int64 data type (numerical)
- 4 columns with the object data type (categorical)
- 1 column with the float64 data type (numerical)

- Each column contains 1898 records and there are no null values.
- Order ID and Customer ID are unique identifiers. We will double check this later to be certain.
- The rating column has string datatypes. Generally, we want to transform fiekds with mixed datatypes to avoid processing errors later on.



#### **Treatment of Ratings:**

In [7]:
#cast ratings to numerical values

#returns unique values counts so we can determine what values exist.
print(df_foodhub['rating'].value_counts(), '\n')

#changes 'Not given' to numeric value of 0 and convert to float to match the other columns. casting to nan is another solution
df_foodhub['rating'] = df_foodhub['rating'].replace('Not given', 0).astype(np.float64)

#verify the change
print((df_foodhub['rating'].dtype), '\n')

Not given    736
5            588
4            386
3            188
Name: rating, dtype: int64 

float64 



### **Question 3:** Are there any missing values in the data? If yes, treat them using an appropriate method





In [9]:
#checks missing values across each columns
df_foodhub.isnull().sum()

order_id                 0
customer_id              0
restaurant_name          0
cuisine_type             0
cost_of_the_order        0
day_of_the_week          0
rating                   0
food_preparation_time    0
delivery_time            0
dtype: int64

#### **Observations:**
- There are 0 null values in the dataset.


### **Question 4:** Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed?

In [8]:
#lets first use describe() to return a summary statistics for numerical columns.
#this will give us a sense of the distribution of features in each column and the  .
df_foodhub.describe(include='all')

Unnamed: 0,order_id,customer_id,restaurant_name,cuisine_type,cost_of_the_order,day_of_the_week,rating,food_preparation_time,delivery_time
count,1898.0,1898.0,1898,1898,1898.0,1898,1898.0,1898.0,1898.0
unique,,,178,14,,2,,,
top,,,Shake Shack,American,,Weekend,,,
freq,,,219,584,,1351,,,
mean,1477495.5,171168.48,,,16.5,,2.66,27.37,24.16
std,548.05,113698.14,,,7.48,,2.2,4.63,4.97
min,1476547.0,1311.0,,,4.47,,0.0,20.0,15.0
25%,1477021.25,77787.75,,,12.08,,0.0,23.0,20.0
50%,1477495.5,128600.0,,,14.14,,4.0,27.0,25.0
75%,1477969.75,270525.0,,,22.3,,5.0,31.0,28.0


In [None]:
#print statistical summary, min, max, mean for food_preparation_time
def print_prep_time_stats(dataframe):

  stat_summary = df_foodhub['food_preparation_time'].describe()
  min_time = dataframe['food_preparation_time'].min()
  avg_time = round(dataframe['food_preparation_time'].mean(), 2)
  max_time = dataframe['food_preparation_time'].max()

  print('Summary:','\n',stat_summary, '\n')
  print('Minimum:', min_time)
  print('Average:', avg_time)
  print('Maximum:', max_time)

print_prep_time_stats(df_foodhub)

#### **Observations:**
- Once an order has been submitted food preparation time ranges from 20 to 35 minutes.

- The average delivery time is 24.16 minutes.

- A standard deviation of 4.63 minutes indicates consistency.

- Variations are to be expected as different meals require different inputs and prep times.



### **Question 5:** How many orders are not rated?

In [None]:
#returns the total count of non-rated orders
print(f"Non-rated: {len(df_foodhub[df_foodhub['rating'] == 0])}")

#### **Observations:**
- There are a total of 736 orders where a rating was not given.


# **Exploratory Data Analysis (EDA)**

## **Univariate Analysis**

### **Question 6:** Explore all the variables and provide Observations for their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.)

#### **Cost of The Order**



In [None]:
#histogram
sns.histplot(df_foodhub['cost_of_the_order'], kde=True, color='blue')
plt.title('Distribution of Order Cost')
plt.xlabel('Order Cost ($)')
plt.ylabel('Frequency')
plt.xticks(np.arange(2, 38, step=2))
plt.tight_layout()
plt.show()

#boxplot
sns.boxplot(x=df_foodhub['cost_of_the_order'], color='cornflowerblue')
plt.xlabel('Order Cost ($)')
plt.xticks(np.arange(2, 38, step=2))
plt.tight_layout()
plt.show()


##### **Observations for Cost of Order**

**Hist Plot:**
- The histplot shows us a wide distribution of order cost.
- The data set is slightly right-skewed, with most orders being in the lower end of the price range between 10 to 15 dollars.
- This suggests that the majority of orders are for lower cost, smaller meals, sinlge individuals, or there is a common pricing strategy among the restaurants.

**Box Plot:**
- The whiskers on the boxplot confirms a wide distribution of order cost.   
- The distribution of order cost is right-skewed with a median around $14.
- There appears to be some outliers on the higher end indicating hihger-end or premium items.



#### **Food Preparation Time**

In [None]:
#food preperation time

#histplot

sns.histplot(df_foodhub['food_preparation_time'], kde=True, color='purple')
plt.title('Distribution of Food Preparation Time')
plt.xlabel('Food Preperation Time')
plt.ylabel('Frequency')
plt.xticks(np.arange(20, 37, step=2))
plt.tight_layout()
plt.show()


#boxplot
sns.boxplot(data=df_foodhub, x='food_preparation_time', color='plum')
plt.xlabel('Food Preparation Time')
plt.ylabel('Box Plot')
plt.xticks(np.arange(20, 37, step=2))
plt.tight_layout()
plt.show()


##### **Observations for Food Prep Time**

**Hist Plot:**
- The histplot shows a relatively uniform distribution of food preparation time between 22 and 34 minutes.
- Many of the bins also have roughly the same frequency indicating there is a consistent range for food preperation time with little variation.

**Box Plot:**
- The boxplot reveals a median food preperation time of ~27 minutes while confirming a uniform distribution
- There doesn't appear to be any visible outliers.


#### **Delivery Time**

In [None]:
#delivery time

#histplot

sns.histplot(df_foodhub['delivery_time'], kde=True, color='orange')
plt.title('Distribution of Delivery Time')
plt.xticks(np.arange(15, 37, step=2))
plt.tight_layout()
plt.show()

#boxplot

sns.boxplot(x=df_foodhub['delivery_time'], color='moccasin')
plt.title('Delivery Time')
plt.xticks(np.arange(15, 37, step=2))
plt.tight_layout()
plt.show()

##### **Observations for Delivery Time**

**Hist Plot:**
- The histplot indicates that a majority of the orders take between 22 and 28 minutes to be delivered.
- The distribution suggests a degree of consistency and efficient in delivery times.

**Box Plot:**
- The boxplot indicates a median delivery time of 25 minutes.
- No significant outliers are visible confirming what we see on the histogram. Insicating slow delivery times are minimal.

#### **Day of the Week**

In [None]:
#day of the week
plt.figure(figsize=(10, 8))
ax = sns.countplot(x='day_of_the_week', data=df_foodhub, order=['Weekday', 'Weekend'], hue='day_of_the_week', palette='plasma')
plt.title('Distribution of Order by Weekday vs Weekend')
plt.xlabel('Part of Week')
plt.ylabel('Order Count')
plt.ylim(0, 1.2* df_foodhub['day_of_the_week'].value_counts().max())
plt.tight_layout()
for p in ax.patches:
    ax.annotate(format(p.get_height(), '.0f'), (p.get_x() + p.get_width() / 2., p.get_height()), ha = 'center', va = 'center', xytext = (0, 10), textcoords = 'offset points')
plt.show()

##### **Observations for Day of Week**
- The distribution between "Weekday" and "Weekend" shows that there is significant higher demand on the weekend than weekdays.

- This suggest consumers are more likely to order when they are off work or not in school.

- The increased volume on the weekends may indicate the need for more staff.

#### **Restaurant Names**

In [None]:
#restaurant names

#choose the number of observations to plot
N=10

#gets frequencies
restaurant_cnts = df_foodhub['restaurant_name'].value_counts().head(N)

plt.figure(figsize=(10, 8))
ax = sns.barplot(x=restaurant_cnts.values, y=restaurant_cnts.index, hue=restaurant_cnts.index, palette='plasma', legend=False)
plt.title(f"Top {N} Restaurants")
plt.xlabel('Number of Orders')
plt.ylabel('Restaurant Name')
plt.tight_layout()
plt.show()


##### **Observations for Restaurant**
- A few of the most popular restaurants are a clear choice among consumers. This indicates there are prominent leaders may have a competitive advantage of some sort leading to the volume of the orders. Remaining committed to these establishments is important as it can be deduced that they make up the bulk of fee based revenue.

#### **Cuisine Types**

In [None]:
#cuisine types

#choose the number of observations to plot
N=10

#gets frequencies
cuisines_cnts = df_foodhub['cuisine_type'].value_counts().head(N)

plt.figure(figsize=(10, 8))
ax = sns.barplot(x=cuisines_cnts.values, y=cuisines_cnts.index, hue=cuisines_cnts.index, palette='plasma',legend=False)
plt.title(f'Top {N} Most Popular Cuisine Types')
plt.xlabel('Number of Orders')
plt.ylabel('Cuisine Type')
plt.tight_layout()
plt.show()


##### **Observations for Cuisine Type**
- American cuisine is the most popular cuisine type, followed by Japanese and Italian.
- The least popular cuisine type is Vietnamese.

#### **Orders vs Ratings**

In [None]:
#ratings

#gets frequencies
ratings_cnt = df_foodhub['rating'].value_counts()

plt.figure(figsize=(10, 8))
ax = sns.barplot(x=ratings_cnt.index, y=ratings_cnt.values, hue=ratings_cnt.values, palette='plasma', legend=False)
plt.title(f'Count of Order Ratings')
plt.xlabel('Ratings')
plt.ylabel('Count')
plt.tight_layout()
for p in ax.patches:
    ax.annotate(format(p.get_height(), '.0f'), (p.get_x() + p.get_width() / 2., p.get_height()), ha = 'center', va = 'center', xytext = (0, 8), textcoords = 'offset points')
plt.show()





##### **Observations for Rating**

- A significant number of orders did not receive a rating.
- There is a large number of high ratings which indicates customers are satisfied with the food and service received.

### **Question 7**: Which are the top 5 restaurants in terms of the number of orders received?

In [None]:
df_foodhub['restaurant_name'].value_counts().reset_index().head(5)

##### **Observations:**
- Shake Shack tops the list of restaurants with 219 orders. Following are The Meatball Shop with 132 orders, Blue Ribbon Sushi with 119 orders, Blue Ribbon Fried Chicken with 96 orders, and Parm with 68 orders.

### **Question 8**: Which is the most popular cuisine on weekends?

In [None]:
df_foodhub[df_foodhub['day_of_the_week'] == 'Weekend']['cuisine_type'].value_counts().reset_index().head(1)

##### **Observations:**
- American cuisine is the most popular weekend choice with a count of 415

### **Question 9**: What percentage of the orders cost more than 20 dollars?

In [None]:
#returns Total Orders
ttl_observations = len(df_foodhub)

#returns records greater than $20
ttl_greater_20 = len(df_foodhub[df_foodhub['cost_of_the_order'] > 20])

#returns percent of orders > $20
percent_orders_greater_20 = round((ttl_greater_20 / ttl_observations) * 100, 2)

print(str(percent_orders_greater_20) + '%')


##### **Observations:**
- Orders costing more than $20 represents 29.24% of total orders.

### **Question 10**: What is the mean order delivery time?

In [None]:
df_foodhub['delivery_time'].mean().round(2)

#### **Observations:**
- The average delivery time for prepared orders is 24.16 minutes.


### **Question 11:** The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find the IDs of these customers and the number of orders they placed.

In [None]:
# Top 3 Customers

#groups sorts and renames the column header to a more descriptive name
df_foodhub.groupby('customer_id')['order_id'].size().sort_values(ascending=False).to_frame().rename(columns={'order_id': 'order_cnt'}).reset_index().head(3)

##### **Observations:**
- The customer_id: 52832 is the most frequent customer, having placed 13 orders.

- The customer_id: 47440 is the second most frequent customer, with 10 orders.

- The customer_id: 83287 is the third most frequent customer, having placed 9 orders.


# **Multivariate Analysis**

### **Question 12**: Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables)

#### **Pairplot**

In [None]:
#pairplot
#selects numeric coluns and drop unique identfiers not needed for analysis
numeric_column_list = df_foodhub.select_dtypes(include=['float', 'int']).drop(columns=['order_id', 'customer_id'])

sns.pairplot(df_foodhub, vars=numeric_column_list, diag_kind='kde')
plt.show()

#### **Correlation Heatmap**

In [None]:
#correlation analysis

#selects numeric coluns and drop unique identfiers not needed for analysis
numeric_column_list = df_foodhub.select_dtypes(include=['float', 'int']).drop(columns=['order_id', 'customer_id'])

plt.figure(figsize=(10, 8))
plt.tight_layout()
sns.heatmap(numeric_column_list.corr(), annot=True, vmin=-1, vmax=1, cmap='coolwarm')
plt.show()



##### **Observations:**
- The heatmap shows low correlation among the values.

#### **Cuisine Cost Distribution**



In [None]:
# creates a boxplot for each cuisine type to analyze the cost distribution.
plt.figure(figsize=(10, 8))
sns.boxplot(x='cost_of_the_order', y='cuisine_type', data=df_foodhub, hue='cuisine_type', palette='Set2')
plt.title('Cost Distribution per Cuisine')
plt.xlabel('Cost')
plt.ylabel('Cuisine')
plt.tight_layout()
plt.show()

##### **Observations:**
- The boxplot indicates a high variability of costs across cuisines. This is expected as many menus have standard and premiumn items.

- Korean and Vietnamese cuisines appear to be budget friendly options.

- Italian, American, Chinese, and Japanese cuisines show similar cost distributions, suggesting comparable price ranges.

#### **Cuisine Demand - Weekday vs Weekend**

In [None]:
plt.figure(figsize=(10, 8))
sns.countplot(y='cuisine_type', hue='day_of_the_week', data=df_foodhub, palette='Set2',
              order=df_foodhub['cuisine_type'].value_counts().index)
plt.title('Cuisine Demand - Weekday vs Weekend')
plt.xlabel('Number of Orders')
plt.ylabel('Cuisine Type')
plt.legend(title='Day of the Week')
plt.tight_layout()
plt.show()

##### **Observations:**
- There is an obvious increase in demand on the weekends.
- Demand across all cuisines appears to be double that of the weekdays.
- Demand is also stable across all cusines on a day to day basis.

#### **Cuisine Ratings**


In [None]:
plt.figure(figsize=(10, 8))
sns.countplot(y='cuisine_type', hue='rating', data=df_foodhub, palette='Set3',
              order=df_foodhub['cuisine_type'].value_counts().index)
plt.title('Rating by Cuisine')
plt.xlabel('Number of Orders')
plt.ylabel('Cuisine Type')
plt.legend(title='Rating')
plt.tight_layout()
plt.show()

##### **Observations:**
- Cusines with the highest ratings are also the most popular.
- Less popular restaurants have fewer rating.
- Rating variability appears to be consistent across cuisines.

#### **Rating Correlations**


In [None]:
#plots 2x2 plot comparing ratings to time and cost
def plot_rating_insights(df, x, palette='plasma', legend=False):

    #calculate total time taken for food preparation and delivery
    df['total_time_taken'] = df['food_preparation_time'] + df['delivery_time']

    # multi-panel visualization
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    # rating vs food prep time
    sns.boxplot(x=x, y='food_preparation_time', data=df, ax=axes[0, 0], hue=x, palette=palette, legend=legend)
    axes[0, 0].set_title('Rating vs Food Prep Time')

    # rating vs delivery time
    sns.boxplot(x=x, y='delivery_time', data=df, ax=axes[0, 1], hue=x, palette=palette, legend=legend)
    axes[0, 1].set_title('Rating vs Delivery Time')

    # rating vs time
    sns.boxplot(x=x, y='total_time_taken', data=df, ax=axes[1, 0], hue=x, palette=palette, legend=legend)
    axes[1, 0].set_title('Rating vs Total Time')

    # rating vs cost
    sns.boxplot(x=x, y='cost_of_the_order', data=df, ax=axes[1, 1], hue=x, palette=palette, legend=legend)
    axes[1, 1].set_title('Rating vs Cost')

    plt.tight_layout()
    plt.show()

#call the function
plot_rating_insights(df_foodhub, x='rating', palette='plasma')


##### **Observations:**
- It appears that ratings are more or less unaffected by the time it takes to get the order nor cost.
- This can indicate high levels of customer satisfactinon and efficiency in both food prep and delivery times.
- Customers should be incentivized to provide ratings as this is a valuable measures for the business.

#### **Weekly Delivery Times**

In [None]:
#relationship between day of the week and delivery time

plt.figure(figsize=(10, 8))
sns.boxplot(x='day_of_the_week', y='delivery_time', data=df_foodhub, hue='day_of_the_week', palette='Set2')
plt.title('Delivery Time by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Delivery Time (minutes)')
plt.tight_layout()
plt.show()

##### **Observations:**
  - The median delivery appears consistent between weekends and weekdays.
  - Delivery times on the weekdays are slightly longer. This could be due to traffic during the week.
  - There does not appear to be any outliers.

### **Question 13:** The company wants to provide a promotional offer in the advertisement of the restaurants. The condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer.

In [None]:
# filter out rows where there is no rating as these will skew the outcomes
df_foodhub_filtered = df_foodhub[df_foodhub['rating'] != 0]

# group by restaurant name and calculate the average rating and count the of ratings for each restaurant
df_foodhub_grouped = df_foodhub_filtered.groupby('restaurant_name').agg(
    mean_rating=('rating', 'mean'),
    rating_count=('rating', 'count')
).reset_index()

# get restaurants that have a rating count more than 50 and average rating greater than 4
df_foodhub_promo_qaulified = df_foodhub_grouped[(df_foodhub_grouped['rating_count'] > 50) & (df_foodhub_grouped['mean_rating'] > 4)]

# making a copy to avoid slice warning appearing when running notebook.
df_foodhub_promo_qaulified = df_foodhub_promo_qaulified.copy()

# sort values and drop index
df_foodhub_promo_qaulified.sort_values('rating_count', ascending=False, inplace=True)
df_foodhub_promo_qaulified.reset_index(drop=True, inplace=True)

df_foodhub_promo_qaulified.head()

##### **Observations:**
- The Meatball Shop, Blue Ribbon Fried Chicken, Shake Shack, and Blue Ribbon Sushi have qualified for the promotional offer.

### **Question 14:** The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders.

In [None]:
#returns calculated revenue
def calculate_revenue(cost):
    return np.where(cost > 20, cost * 0.25,
            np.where((5 < cost) & (cost <= 20), cost * 0.15, 0))

df_foodhub['net_revenue'] = calculate_revenue(df_foodhub['cost_of_the_order'].values)

#sums and rounds net_revenue values then stores the result as a variable
net_revenue = round(df_foodhub['net_revenue'].sum(), 2)

print(net_revenue)

##### **Observations:**
- Foodhub collected 6,166.30 of net revenue
- Charging a higher fee for orders greater that $20 appears to be a strategic in nature. More analysis could be done to determine if revenue is being optimized without losing customers.

### **Question 15:** The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.)

In [None]:
#function to calculate percent of 60

def calculate_percentage_over_60(df):

    #sums total time and deliveries over 60
    ttl_deliveries_60 = len(df[(df['food_preparation_time'] + df['delivery_time']) > 60])

    #total orders
    ttl_orders = len(df)

    #percent over 60
    prcnt_60 = round((ttl_deliveries_60 / ttl_orders) * 100, 2)

    return prcnt_60

percentage_over_60 = calculate_percentage_over_60(df_foodhub)
print(percentage_over_60)


##### **Observations:**
- 10.54% of the FoodHub orders take more than 60 minutes to delver from the time the order was placed.

In [None]:
### **Question 16:** The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends?

In [None]:
# Calculates the mean delivery time for weekdays and weekends
mean_delivery_time_weekdays = round(df_foodhub[df_foodhub['day_of_the_week'] == 'Weekday']['delivery_time'].mean(),2)
mean_delivery_time_weekends = round(df_foodhub[df_foodhub['day_of_the_week'] == 'Weekend']['delivery_time'].mean(),2)

print('Weekday Avg Delivery Time:', mean_delivery_time_weekdays)
print('Weekend Avg Delivery Time:', mean_delivery_time_weekends)

print(f"It took an average of {round((mean_delivery_time_weekdays - mean_delivery_time_weekends), 2)} mins more to deliver the food on weekdays")

##### **Observations:**
- Mean delivery time on the weekdays is 28.34 minutes versus 22.47 minutes for weekend orders.
- It took an average of 5.87 mins more to deliver the food on weekdays.

### **Question 17:** What are your conclusions from the analysis? What recommendations would you like to share to help improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.)

##### **Conclusion and Recommendations**

##### **Customer Ratings**
> ###### **Conclusion:**
>
> A notable number of orders are not receiving ratings. This suggests potential gaps in customer engagement or satisfaction measurement efforts.
>
> The willingness of a customer to leave a review appears to be largely independent of all operational metrics. This infers that there are other factors at play such as food quality or service which are exhibiting a more pronounced impact on ratings being provided.
>
> ###### **Recommendations:**
>
> Focus on enhancing food quality, service, and implement strategies that will incentivize customers into providing feedback. This will lead to an increased number of ratings being provided. This will provide the business with the data necessary to gain deeper insights into what drives customer satisfaction.



##### **Cuisine Preference and Price Sensitivity**

> ###### **Conclusion:**
> There are significant variations in the costs and the popularity of cuisines. Certain cuisines have a more budget-friendly feel, others are priced to appeal to the majority, while others are clear outliers, exhibiting a significantly higher than average cost. This reflects diverse customer preferences and price sensitivity.
>
> ###### **Recommendations:**
>
> Expand the variety of cuisine options available, primarily for those in the budget-friendly category. Address the high-cost outliers by reviewing pricing strategies, promotional offers, and creating strategic partnerships with restaurants with a commitment to customer satisfaction and profits.

##### **Order Revenue Maximization**

> ###### **Conclusion:**
> There is a noticeable surge in demand on the weekends vs weekdays. Weekend order volume is approximately twice that of weekdays.
>
> ###### **Recommendation:**
> Continue to focus on the weekend deliveries as a main revenue driver while also reducing the number of incentives or promotions being offered during this timeframe. Influence volume during the week with a combination of targeted incentives, promotions, meal deals, discounts, and featured restaurants should be deployed. Increasing marketing spend in this way is a good way to encourage customers to choose FoodHub during their workweek.

##### **Operational Efficiency**

> ###### **Conclusion:**
> Despite the lack of a strong correlation between food preparation and delivery times on customer ratings, consistent performance in these areas is crucial for maintaining customer expectations.
>
> ###### **Recommendation:**
>
> Maintain operational efficiency to ensure customer getting what they have come to expect.
Additionally, it's advisable to explore innovative methods that will cut down on food prep and delivery times, while also ensuring food quality and customer service remain unaffected.

##### **Strategic Fee Structure**

> ###### **Conclusion:**
> The existing fee strategy of imposing a 25% fee for orders over $20 and 15% for those between $5 and $20, may not be fully optimizing fee-based revenue.
>
> ###### **Recommendation:**
> Run optimization scenarios and gather feedback from restaurants regarding how the fee affects their overall margins and whether the fee is being passed to the consumer. An optimal balance is critical in order to maintain healthy relationships with restaurants while not stubbing their ability to provide good food at a good price.

##### **AI/ML Adoption**

> ###### **Conclusion:**
> Given the increase prevalence and access to AI/Ml technologies, investing in these technologies now is prudent for a sustained and durable competitive advantage to be realized.
>
> ###### **Recommendation:**
> AI/ML can be used for peak demand pricing strategies, personalized pricing and promotional offers, tailored menu suggestions, and loyalty incentive programs. Other areas AI/ML will show a sizable impact are sentiment analysis, real-time demand forecasting, or delivery route optimization.
>
> By implementing and executing on the recommendations provided; FoodHub will be able safeguard their market competitiveness, increase customer satisfaction, brand loyalty, and strategic partnerships.