# Restaurant Operations Analysis

### The Situation

You've been hired as Data Analyst for the Taste of the World Cafe, a restaurant that has diverse menu offerings and serves generous portions.

### The Assignment

The Taste of the World Cafe debuted a new menu at the start of the year.

You've been asked to dig into the customer data to see which menu items are doing well / not well and what the top customers seem to like best.

### The Objectives

1) Explore the `menu_items` table to get an idea of what's on the new menu
2) Explore the `order_details` table to get an idea of the data that's been collected
3) Use both tables to understand how customers are reacting to the new menu.

## Imports

In [1]:
import polars as pl

## Objective 1: Explore the Items Table

1) View the `menu_items` table and write a query to find the number of items on the menu.

In [2]:
menu_items = pl.read_csv("restaurant-orders/menu_items.csv")

menu_items.describe()

statistic,menu_item_id,item_name,category,price
str,f64,str,str,f64
"""count""",32.0,"""32""","""32""",32.0
"""null_count""",0.0,"""0""","""0""",0.0
"""mean""",116.5,,,13.285937
"""std""",9.380832,,,3.858071
"""min""",101.0,"""California Roll""","""American""",5.0
"""25%""",109.0,,,10.5
"""50%""",117.0,,,13.95
"""75%""",124.0,,,15.5
"""max""",132.0,"""Veggie Burger""","""Mexican""",19.95


2) What are the least and most expensive items on the menu?

In [3]:
menu_items.filter(pl.col("price") == pl.col("price").max())

menu_item_id,item_name,category,price
i64,str,str,f64
130,"""Shrimp Scampi""","""Italian""",19.95


In [4]:
menu_items.filter(pl.col("price") == pl.col("price").min())

menu_item_id,item_name,category,price
i64,str,str,f64
113,"""Edamame""","""Asian""",5.0


3) How many Italian dishes are on the menu? What are the least and most expensive Italian dishes on the menu?

In [5]:
italian_dishes = (
    menu_items.filter(pl.col('category') == "Italian")
)

italian_dishes

menu_item_id,item_name,category,price
i64,str,str,f64
124,"""Spaghetti""","""Italian""",14.5
125,"""Spaghetti & Meatballs""","""Italian""",17.95
126,"""Fettuccine Alfredo""","""Italian""",14.5
127,"""Meat Lasagna""","""Italian""",17.95
128,"""Cheese Lasagna""","""Italian""",15.5
129,"""Mushroom Ravioli""","""Italian""",15.5
130,"""Shrimp Scampi""","""Italian""",19.95
131,"""Chicken Parmesan""","""Italian""",17.95
132,"""Eggplant Parmesan""","""Italian""",16.95


In [6]:
italian_dishes.describe()

statistic,menu_item_id,item_name,category,price
str,f64,str,str,f64
"""count""",9.0,"""9""","""9""",9.0
"""null_count""",0.0,"""0""","""0""",0.0
"""mean""",128.0,,,16.75
"""std""",2.738613,,,1.865811
"""min""",124.0,"""Cheese Lasagna""","""Italian""",14.5
"""25%""",126.0,,,15.5
"""50%""",128.0,,,16.95
"""75%""",130.0,,,17.95
"""max""",132.0,"""Spaghetti & Meatballs""","""Italian""",19.95


In [7]:
italian_dishes.filter(pl.col("price") == pl.col("price").max())

menu_item_id,item_name,category,price
i64,str,str,f64
130,"""Shrimp Scampi""","""Italian""",19.95


In [8]:
italian_dishes.filter(pl.col("price") == pl.col("price").min())

menu_item_id,item_name,category,price
i64,str,str,f64
124,"""Spaghetti""","""Italian""",14.5
126,"""Fettuccine Alfredo""","""Italian""",14.5


4) How many dishes are in each category? What is the average dish price within each category?

In [9]:
(
    menu_items
    .group_by("category")
    .agg(
        num_dish = pl.len(),
        avg_price = pl.col('price').mean()
    )
)

category,num_dish,avg_price
str,u32,f64
"""American""",6,10.066667
"""Asian""",8,13.475
"""Mexican""",9,11.8
"""Italian""",9,16.75


## Objective 2: Explore the Orders Table

1) View the `order_details` table. What is the date range of the table?

In [10]:
order_details = pl.scan_csv(
    "restaurant-orders/order_details.csv", 
    null_values="NULL"
)

order_details = order_details.with_columns(
        pl.col("order_date").str.to_date(format="%m/%d/%y")
    )

order_details.select("order_date").describe()

statistic,order_date
str,str
"""count""","""12234"""
"""null_count""","""0"""
"""mean""","""2023-02-14 11:01:44.168710"""
"""std""",
"""min""","""2023-01-01"""
"""25%""","""2023-01-23"""
"""50%""","""2023-02-14"""
"""75%""","""2023-03-09"""
"""max""","""2023-03-31"""


2) How many orders were made within this date range? How many items were ordered within this date range?

In [11]:
order_details.select("order_id").unique().count().collect()

order_id
u32
5370


In [12]:
grouped_orders = order_details.group_by("order_id").agg(num_items = pl.len())
grouped_orders.describe()

statistic,order_id,num_items
str,f64,f64
"""count""",5370.0,5370.0
"""null_count""",0.0,0.0
"""mean""",2685.5,2.278212
"""std""",1550.329804,1.679162
"""min""",1.0,1.0
"""25%""",1343.0,1.0
"""50%""",2686.0,2.0
"""75%""",4028.0,3.0
"""max""",5370.0,14.0


In [13]:
order_details.describe()

statistic,order_details_id,order_id,order_date,order_time,item_id
str,f64,f64,str,str,f64
"""count""",12234.0,12234.0,"""12234""","""12234""",12097.0
"""null_count""",0.0,0.0,"""0""","""0""",137.0
"""mean""",6117.5,2691.927415,"""2023-02-14 11:01:44.168710""",,115.202282
"""std""",3531.795931,1546.026261,,,9.38758
"""min""",1.0,1.0,"""2023-01-01""","""10:00:17 PM""",101.0
"""25%""",3059.0,1351.0,"""2023-01-23""",,107.0
"""50%""",6118.0,2710.0,"""2023-02-14""",,114.0
"""75%""",9176.0,4020.0,"""2023-03-09""",,123.0
"""max""",12234.0,5370.0,"""2023-03-31""","""9:59:50 PM""",132.0


3) Which orders had the most number of items?

In [14]:
grouped_orders.sort("num_items", descending=True).collect()

order_id,num_items
i64,u32
2675,14
3473,14
1957,14
4482,14
440,14
…,…
297,1
2658,1
2003,1
1566,1


4) How many orders had more than 12 items?

In [15]:
grouped_orders.filter(pl.col("num_items") > 12).collect()

order_id,num_items
i64,u32
2126,13
2725,13
4836,13
3583,13
1957,14
…,…
2075,13
330,14
440,14
443,14


## Objective 3: Analyze Customer Behavior

1) Combine the `menu_items` and `order_details` tables into a single table.

In [16]:
combined_menu_order = order_details.join(
    menu_items.lazy(), 
    left_on="item_id", 
    right_on="menu_item_id",
    how="left"
)
combined_menu_order.head().collect()

order_details_id,order_id,order_date,order_time,item_id,item_name,category,price
i64,i64,date,str,i64,str,str,f64
1,1,2023-01-01,"""11:38:36 AM""",109,"""Korean Beef Bowl""","""Asian""",17.95
2,2,2023-01-01,"""11:57:40 AM""",108,"""Tofu Pad Thai""","""Asian""",14.5
3,2,2023-01-01,"""11:57:40 AM""",124,"""Spaghetti""","""Italian""",14.5
4,2,2023-01-01,"""11:57:40 AM""",117,"""Chicken Burrito""","""Mexican""",12.95
5,2,2023-01-01,"""11:57:40 AM""",129,"""Mushroom Ravioli""","""Italian""",15.5


2) What were the least and most ordered items? What categories were they in?

In [17]:
(
    combined_menu_order
    .group_by("item_name")
    .agg(
        item_freq = pl.len(),
        category = pl.col("category").min()
    )
    .sort("item_freq", descending=True)
    .collect()
)

item_name,item_freq,category
str,u32,str
"""Hamburger""",622,"""American"""
"""Edamame""",620,"""Asian"""
"""Korean Beef Bowl""",588,"""Asian"""
"""Cheeseburger""",583,"""American"""
"""French Fries""",571,"""American"""
…,…,…
"""Steak Tacos""",214,"""Mexican"""
"""Cheese Lasagna""",207,"""Italian"""
"""Potstickers""",205,"""Asian"""
,137,


In [18]:
(
    combined_menu_order
    .group_by("category")
    .agg(
        category_freq = pl.len()
    )
    .sort("category_freq", descending=True)
    .collect()
)

category,category_freq
str,u32
"""Asian""",3470
"""Italian""",2948
"""Mexican""",2945
"""American""",2734
,137


3) What were the top 5 orders that spent the most money?

In [19]:
(
    combined_menu_order
    .group_by("order_id")
    .agg(
        order_rev = pl.col("price").sum(),
        order_date = pl.col("order_date").min(),
        order_time = pl.col("order_time").min(),
        items_ordered = pl.col("item_name"),
        category_ordered = pl.col("category")
    )
    .sort("order_rev", descending=True)
    .head(5)
    .collect()
)

order_id,order_rev,order_date,order_time,items_ordered,category_ordered
i64,f64,date,str,list[str],list[str]
440,192.15,2023-01-08,"""12:16:34 PM""","[""Steak Tacos"", ""Hot Dog"", … ""Eggplant Parmesan""]","[""Mexican"", ""American"", … ""Italian""]"
2075,191.05,2023-02-04,"""2:03:04 PM""","[""Orange Chicken"", ""Chicken Tacos"", … ""Eggplant Parmesan""]","[""Asian"", ""Mexican"", … ""Italian""]"
1957,190.1,2023-02-02,"""2:50:01 PM""","[""Orange Chicken"", ""Hot Dog"", … ""Eggplant Parmesan""]","[""Asian"", ""American"", … ""Italian""]"
330,189.7,2023-01-06,"""1:27:11 PM""","[""Orange Chicken"", ""Hot Dog"", … ""Potstickers""]","[""Asian"", ""American"", … ""Asian""]"
2675,185.1,2023-02-14,"""2:41:49 PM""","[""Hamburger"", ""Cheeseburger"", … ""Eggplant Parmesan""]","[""American"", ""American"", … ""Italian""]"


4) View the details of the highest spend order. What insights can you gather from the results?

In [20]:
highest_spend_order = combined_menu_order.filter(pl.col("order_id") == 440)
highest_spend_order.collect()

order_details_id,order_id,order_date,order_time,item_id,item_name,category,price
i64,i64,date,str,i64,str,str,f64
1003,440,2023-01-08,"""12:16:34 PM""",116,"""Steak Tacos""","""Mexican""",13.95
1004,440,2023-01-08,"""12:16:34 PM""",103,"""Hot Dog""","""American""",9.0
1005,440,2023-01-08,"""12:16:34 PM""",124,"""Spaghetti""","""Italian""",14.5
1006,440,2023-01-08,"""12:16:34 PM""",125,"""Spaghetti & Meatballs""","""Italian""",17.95
1007,440,2023-01-08,"""12:16:34 PM""",125,"""Spaghetti & Meatballs""","""Italian""",17.95
…,…,…,…,…,…,…,…
1012,440,2023-01-08,"""12:16:34 PM""",113,"""Edamame""","""Asian""",5.0
1013,440,2023-01-08,"""12:16:34 PM""",122,"""Chips & Salsa""","""Mexican""",7.0
1014,440,2023-01-08,"""12:16:34 PM""",131,"""Chicken Parmesan""","""Italian""",17.95
1015,440,2023-01-08,"""12:16:34 PM""",106,"""French Fries""","""American""",7.0


In [21]:
(
    highest_spend_order
    .group_by("category")
    .agg(
        spend = pl.col("price").sum(),
        count = pl.len()
    )
    .sort("spend", descending=True)
    .collect()    
)

category,spend,count
str,f64,u32
"""Italian""",132.25,8
"""Asian""",22.95,2
"""Mexican""",20.95,2
"""American""",16.0,2


In [22]:
(
    combined_menu_order
    .filter(
        pl.col("order_id").is_in([440, 2075, 1957, 330, 2675])
    ).group_by("category")
    .agg(
        spend = pl.col("price").sum(),
        count = pl.len()
    )
    .sort("spend", descending=True)
    .collect()    
)

category,spend,count
str,f64,u32
"""Italian""",430.65,26
"""Asian""",228.65,17
"""Mexican""",189.45,16
"""American""",99.35,10


In [23]:
(
    combined_menu_order
    .filter(
        pl.col("order_id").is_in([440, 2075, 1957, 330, 2675])
    ).group_by("order_id", "category")
    .agg(
        spend = pl.col("price").sum(),
        count = pl.len()
    )
    .sort("spend", descending=True)
    .collect()    
)

order_id,category,spend,count
i64,str,f64,u32
440,"""Italian""",132.25,8
2075,"""Italian""",99.8,6
330,"""Asian""",87.4,6
1957,"""Italian""",84.3,5
2675,"""Italian""",63.9,4
…,…,…,…
440,"""Asian""",22.95,2
440,"""Mexican""",20.95,2
440,"""American""",16.0,2
2075,"""American""",13.95,1
