# Food Hub 

## Context
Restaurant aggregator app that collects local restaurants in one place. It allows customers to place order to their favorite restaurant and have it delivered.
A unique order is created when a customer places and order with a particular restaurant, FoodHub then assigns this delivery to one of its partner driver to go pick up the food from the restaurant an deliver to the customer. 

## Objective
The objective of the analysis is exploratory, to find out general insights around demand in the ecosystem. Is there a cuisine that does better in a particular region? Is there a particular time of day that customers prefer to order at?

In [12]:
# Import Libraries for data manipulation
import pandas as pd
import numpy as np

# Import Libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

In [13]:
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [14]:
# Read the data
df = pd.read_csv('foodhub_order.csv')
# Return first 5 rows
df.head()

Unnamed: 0,order_id,customer_id,restaurant_name,cuisine_type,cost_of_the_order,day_of_the_week,rating,food_preparation_time,delivery_time
0,1477147,337525,Hangawi,Korean,30.75,Weekend,Not given,25,20
1,1477685,358141,Blue Ribbon Sushi Izakaya,Japanese,12.08,Weekend,Not given,25,23
2,1477070,66393,Cafe Habana,Mexican,12.23,Weekday,5,23,28
3,1477334,106968,Blue Ribbon Fried Chicken,American,29.2,Weekend,3,25,15
4,1478249,76942,Dirty Bird to Go,American,11.59,Weekday,4,25,24


##### Observation  
The data has 9 columns. Each row represents a unique order. 
Some customers did not give ratings.

In [16]:
df.shape

(1898, 9)

In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   order_id               1898 non-null   int64  
 1   customer_id            1898 non-null   int64  
 2   restaurant_name        1898 non-null   object 
 3   cuisine_type           1898 non-null   object 
 4   cost_of_the_order      1898 non-null   float64
 5   day_of_the_week        1898 non-null   object 
 6   rating                 1898 non-null   object 
 7   food_preparation_time  1898 non-null   int64  
 8   delivery_time          1898 non-null   int64  
dtypes: float64(1), int64(4), object(4)
memory usage: 133.6+ KB


* The data set has 1898 rows and 9 columns. 
* The order and customer ID are both of the Int data type. Which is OK
* Restaurant name, cusine type, day of the week are object data type, We are going to change these to category.  
* Ratings however should be numbers / Int  
* Cost of order is of the float data type. Which is OK for money  
* Food prep time and delivery time should be date_time not Int


In [21]:
# Changing the data type will reduce the overall size of the data set. Which will improve performance.
df.restaurant_name = df.restaurant_name.astype('category')
df.cuisine_type = df.cuisine_type.astype('category')
df.day_of_the_week = df.day_of_the_week.astype('category')

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype   
---  ------                 --------------  -----   
 0   order_id               1898 non-null   int64   
 1   customer_id            1898 non-null   int64   
 2   restaurant_name        1898 non-null   category
 3   cuisine_type           1898 non-null   category
 4   cost_of_the_order      1898 non-null   float64 
 5   day_of_the_week        1898 non-null   category
 6   rating                 1898 non-null   object  
 7   food_preparation_time  1898 non-null   int64   
 8   delivery_time          1898 non-null   int64   
dtypes: category(3), float64(1), int64(4), object(1)
memory usage: 102.7+ KB


In [22]:
# summary statistics of the dataset
df.describe()

Unnamed: 0,order_id,customer_id,cost_of_the_order,food_preparation_time,delivery_time
count,1898.0,1898.0,1898.0,1898.0,1898.0
mean,1477495.5,171168.4784,16.49885,27.37197,24.16175
std,548.04972,113698.13974,7.48381,4.63248,4.97264
min,1476547.0,1311.0,4.47,20.0,15.0
25%,1477021.25,77787.75,12.08,23.0,20.0
50%,1477495.5,128600.0,14.14,27.0,25.0
75%,1477969.75,270525.0,22.2975,31.0,28.0
max,1478444.0,405334.0,35.41,35.0,33.0


* The average cost of meals is $16.49, ranging between $4.47 to $35.41 
* Majority (75%) of the orders are below $22.29
* The avg time to prepare food is 27 minutes ranging between 20 and 35 minutes. 
* Majority (75%) of the orders take below 31 minutes
* The avg time to deliver food is 24.16 minutes ranging between 15 and 33 minutes
* Majority (75%) of the food deliveries take a about 28 minutes

In [None]:
# How many orders are not rated?
df.rating.value_counts()

Not given    736
5            588
4            386
3            188
Name: rating, dtype: int64

**There are 736 orders that were not rated.**

# Exploratory Data Analysis

## Univariate Analysis