In [1]:
%pip install mysql-connector-python

Note: you may need to restart the kernel to use updated packages.




In [4]:
import mysql.connector
import os
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [6]:
db_user = os.environ.get('my_username')
db_password = os.environ.get('my_password')
db = mysql.connector.connect(host= "localhost", user = db_user, passwd = db_password, database = "danny_pizza")
db

<mysql.connector.connection_cext.CMySQLConnection at 0x21b91880d30>

These are the several questions for this case study

A. Pizza Metrics

- How many pizzas were ordered?
- How many unique customer orders were made?
- How many successful orders were delivered by each runner?
- How many of each type of pizza was delivered?
- How many Vegetarian and Meatlovers were ordered by each customer?
- What was the maximum number of pizzas delivered in a single order?
- For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
- How many pizzas were delivered that had both exclusions and extras?
- What was the total volume of pizzas ordered for each hour of the day?
- What was the volume of orders for each day of the week?
<br/>

B. Runner and Customer Experience

- How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
- What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
- Is there any relationship between the number of pizzas and how long the order takes to prepare?
- What was the average distance travelled for each customer?
- What was the difference between the longest and shortest delivery times for all orders?
- What was the average speed for each runner for each delivery and do you notice any trend for these values?
- What is the successful delivery percentage for each runner?
<br/>

C. Ingredient Optimisation

- What are the standard ingredients for each pizza?
- What was the most commonly added extra?
- What was the most common exclusion?
- Generate an order item for each record in the customers_orders table in the format of one of the following:
- Meat Lovers
- Meat Lovers - Exclude Beef
- Meat Lovers - Extra Bacon
- Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers
- Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients
For example: "Meat Lovers: 2xBacon, Beef, ... , Salami"
- What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?
<br/>

D. Pricing and Ratings

- If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?
- What if there was an additional $1 charge for any pizza extras? Add cheese is $1 extra
- The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset - - generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.
- Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?
customer_id
order_id
runner_id
rating
order_time
pickup_time
Time between order and pickup
Delivery duration
Average speed
Total number of pizzas
- If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?

# Queries

# A. Pizza Metrics

How many pizzas were ordered?

In [7]:
a1 =pd.read_sql_query("select count(order_id) as pizza_count from customer_orders",db)
a1

Unnamed: 0,pizza_count
0,14


How many unique customer orders were made?

In [9]:
a2 =pd.read_sql_query("select count(distinct order_id) as unique_pizza_count from customer_orders",db)
a2

Unnamed: 0,unique_pizza_count
0,10


How many successful orders were delivered by each runner?

In [14]:
a3=pd.read_sql_query("select runner_id, count(order_id) from runner_orders where distance != 0 group by runner_id",db)
a3

Unnamed: 0,runner_id,count(order_id)
0,1,4
1,2,3
2,3,1


How many of each type of pizza was delivered?

In [13]:
a4 =pd.read_sql_query("""select co.pizza_id, pn.pizza_name, (co.pizza_id) as unique_pizza_count 
from customer_orders co  
join runner_orders ro on co.order_id = ro.order_id 
join pizza_names pn on co.pizza_id = pn.pizza_id
where ro.distance != 0 
group by co.pizza_id""",db)
a4

Unnamed: 0,pizza_id,pizza_name,unique_pizza_count
0,1,Meatlovers,1
1,2,Vegetarian,2


How many Vegetarian and Meatlovers were ordered by each customer?

In [15]:
a5 =pd.read_sql_query("""select co.customer_id, pn.pizza_name, count(co.pizza_id) from customer_orders co 
join pizza_names pn on co.pizza_id = pn.pizza_id
group by co.customer_id,co.pizza_id
order by co.customer_id""",db)
a5

Unnamed: 0,customer_id,pizza_name,count(co.pizza_id)
0,101,Meatlovers,2
1,101,Vegetarian,1
2,102,Meatlovers,2
3,102,Vegetarian,1
4,103,Meatlovers,3
5,103,Vegetarian,1
6,104,Meatlovers,3
7,105,Vegetarian,1


What was the maximum number of pizzas delivered in a single order?

In [16]:
a6 =pd.read_sql_query("""select co.order_id, count(co.order_id)
from customer_orders co 
join runner_orders ro on co.order_id = ro.order_id
where distance != 0 
group by co.order_id""",db)
a6

Unnamed: 0,order_id,count(co.order_id)
0,1,1
1,2,1
2,3,2
3,4,3
4,5,1
5,7,1
6,8,1
7,10,2


For each customer, how many delivered pizzas had at least 1 change and how many had no changes?

How many pizzas were delivered that had both exclusions and extras? FIX

In [18]:
a8 =pd.read_sql_query("""with new_table as (select co.order_id as A, co.exclusions as exc, co.extras as ext
from customer_orders co 
join runner_orders ro on co.order_id = ro.order_id 
where ro.distance != 0)
select count(A) from new_table where new_table.exc != null or new_table.ext != null""",db)
a8

Unnamed: 0,count(A)
0,0


What was the total volume of pizzas ordered for each hour of the day?

In [19]:
a9 =pd.read_sql_query("""select  hour(co.order_time) as each_hour, count(co.order_id)
from customer_orders co 
group by each_hour
order by each_hour asc""",db)
a9

Unnamed: 0,each_hour,count(co.order_id)
0,11,1
1,13,3
2,18,3
3,19,1
4,21,3
5,23,3


What was the volume of orders for each day of the week?

In [20]:
a10 =pd.read_sql_query("SELECT DATE_FORMAT(order_time, '%W') AS day_of_week, count(order_id) FROM customer_orders group by day_of_week",db)
a10

Unnamed: 0,day_of_week,count(order_id)
0,Wednesday,5
1,Thursday,3
2,Saturday,5
3,Friday,1


# B. Runner and Customer experiences

How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)

In [21]:
b1 =pd.read_sql_query("""SELECT (DAYOFMONTH(registration_date) - 1) DIV 7 + 1 AS week_of_month, count(runner_id) as num_runner_signed_up
FROM runners group by week_of_month""",db)
b1

Unnamed: 0,week_of_month,num_runner_signed_up
0,1,2
1,2,1
2,3,1


What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order

In [22]:
b2 =pd.read_sql_query("""select avg(MINUTE(TIMEDIFF(ro.pickup_time, co.order_time))) AS avg_time
from customer_orders co 
join runner_orders ro on co.order_id = ro.order_id
where ro.duration != 0""",db)
b2

Unnamed: 0,avg_time
0,18.25


# C. Ingredient optimisation

What are the standard ingredients for each pizza

In [23]:
c1 = pd.read_sql_query("""select pn.pizza_id, pn.pizza_name, pr.toppings, pt.topping_name
from pizza_names pn 
join pizza_recipes pr on pn.pizza_id = pr.pizza_id
join pizza_toppings pt on pt.topping_id = pr.toppings""",db)
c1

Unnamed: 0,pizza_id,pizza_name,toppings,topping_name
0,1,Meatlovers,"1, 2, 3, 4, 5, 6, 8, 10",Bacon
1,2,Vegetarian,"4, 6, 7, 9, 11, 12",Cheese
