# Pizza Runner Case Study:

## Table of Contents

- [Raw data](#Raw-data)
- [Data Cleaning](#Data-Cleaning)
- [Case Study Questions](#A.-Pizza-Metrics)   *_with increasing levels of difficulty_
    - [Pizza Metrics](#A.-Pizza-Metrics)
    - [Runner and Customer Experience](#B.-Runner-and-Customer-Experience)
    - [Ingredient Optimisation](#C.-Ingredient-Optimisation)
    - [Pricing and Ratings](#D.-Pricing-and-Ratings)


## Raw data

The dataset have 6 tables: **runners** | **customer_orders** | **runner_orders** | **pizza_names** | **pizza_recipes** and **pizza_toppings**

### Entity Relationship Diagram

![convert notebook to web app](https://drive.google.com/uc?id=1G7_rerPWGHOumDbz3us9oqDPGQ8o1sGV)

[Back to top](#Pizza-Runner-Case-Study:)

## Data Cleaning

### (1/3) Data Cleaning: Table = runner_orders | Column to be cleaned = cancellation

In [1]:
%reload_ext sql

In [2]:
%%sql

mysql://root:MyN3wP4ssw0rd@localhost:3306/uditdb

In [108]:
%%sql 
SELECT *                -- Before Data Cleaning
FROM   runner_orders;

 * mysql://root:***@localhost:3306/uditdb
10 rows affected.


order_id,runner_id,pickup_time,distance,duration,cancellation
1,1,2020-01-01 18:15:34,20km,32 minutes,
2,1,2020-01-01 19:10:54,20km,27 minutes,
3,1,2020-01-03 00:12:37,13.4km,20 mins,
4,2,2020-01-04 13:53:03,23.4,40,
5,3,2020-01-08 21:10:57,10,15,
6,3,,,,Restaurant Cancellation
7,2,2020-01-08 21:30:45,25km,25mins,
8,2,2020-01-10 00:15:02,23.4 km,15 minute,
9,2,,,,Customer Cancellation
10,1,2020-01-11 18:50:20,10km,10minutes,


In [109]:
%%sql
UPDATE runner_orders
SET cancellation = "Not Cancelled"
WHERE cancellation NOT IN ("Restaurant Cancellation", "Customer Cancellation") or cancellation IS NULL ;

 * mysql://root:***@localhost:3306/uditdb
8 rows affected.


[]

In [110]:
%%sql
SELECT *                -- After Data Cleaning
FROM   runner_orders; 

 * mysql://root:***@localhost:3306/uditdb
10 rows affected.


order_id,runner_id,pickup_time,distance,duration,cancellation
1,1,2020-01-01 18:15:34,20km,32 minutes,Not Cancelled
2,1,2020-01-01 19:10:54,20km,27 minutes,Not Cancelled
3,1,2020-01-03 00:12:37,13.4km,20 mins,Not Cancelled
4,2,2020-01-04 13:53:03,23.4,40,Not Cancelled
5,3,2020-01-08 21:10:57,10,15,Not Cancelled
6,3,,,,Restaurant Cancellation
7,2,2020-01-08 21:30:45,25km,25mins,Not Cancelled
8,2,2020-01-10 00:15:02,23.4 km,15 minute,Not Cancelled
9,2,,,,Customer Cancellation
10,1,2020-01-11 18:50:20,10km,10minutes,Not Cancelled


[Back to top](#Pizza-Runner-Case-Study:)

### (2/3) Data Cleaning: Table = customer_orders | Columns to be cleaned = exclusions, extras

In [119]:
%%sql

SELECT DISTINCT exclusions,           -- Before Data Cleaning
                extras
FROM   customer_orders;

 * mysql://root:***@localhost:3306/uditdb
7 rows affected.


exclusions,extras
,
,
4,
,1
,
4,"1, 5"
"2, 6","1, 4"


In [120]:
%%sql

UPDATE customer_orders
SET exclusions = 0
WHERE exclusions IN ("null","") or exclusions IS NULL ;
UPDATE customer_orders
SET extras = 0
WHERE extras IN ("null","") or extras IS NULL ;

 * mysql://root:***@localhost:3306/uditdb
9 rows affected.
10 rows affected.


[]

In [121]:
%%sql

SELECT DISTINCT exclusions,           -- After Data Cleaning
                extras
FROM   customer_orders;

 * mysql://root:***@localhost:3306/uditdb
5 rows affected.


exclusions,extras
0,0
4,0
0,1
4,"1, 5"
"2, 6","1, 4"


[Back to top](#Pizza-Runner-Case-Study:)

### (3/3) Data Cleaning: Table = pizza_recipes | Columns to be cleaned = toppings

In [269]:
%%sql
select * from pizza_recipes

 * mysql://root:***@localhost:3306/uditdb
2 rows affected.


pizza_id,toppings
1,"1, 2, 3, 4, 5, 6, 8, 10"
2,"4, 6, 7, 9, 11, 12"


**Problem**: To be able to map column "toppings" with other table, it's required to split the values in different rows.<br><br> **Solution**: Created a table numbers having numbers from 1 to the maximum number of items in a cell in the column. For example: pizza_id 1 has 8 toppings.

In [270]:
%%sql
select * from numbers

 * mysql://root:***@localhost:3306/uditdb
8 rows affected.


number
1
2
3
4
5
6
7
8


Now Substring_index is used to do the splitting. And a new table **pizza_recipes_mod** is created.

In [267]:
%%sql
USE uditdb;

CREATE TABLE pizza_recipes_mod AS
  SELECT pizza_recipes.pizza_id,
         Substring_index(Substring_index(pizza_recipes.toppings, ', ',numbers.number),
                        ', ', -1) toppings
  FROM   numbers
         INNER JOIN pizza_recipes
                 ON Length(pizza_recipes.toppings) - Length(Replace(pizza_recipes.toppings, ',','')) >= numbers.number - 1
ORDER  BY pizza_id,
number;

 * mysql://root:***@localhost:3306/uditdb
0 rows affected.
14 rows affected.


[]

In [271]:
%%sql
select * from pizza_recipes_mod

 * mysql://root:***@localhost:3306/uditdb
14 rows affected.


pizza_id,toppings
1,1
1,2
1,3
1,4
1,5
1,6
1,8
1,10
2,4
2,6


[Back to top](#Pizza-Runner-Case-Study:)

## A. Pizza Metrics

1. [How many pizzas were ordered?](#A1.-How-many-pizzas-were-ordered?)<br><br>
2. [How many unique customer orders were made?](#A2.-How-many-unique-customer-orders-were-made?)<br><br>
3. [How many successful orders were delivered by each runner?](#A3.-How-many-successful-orders-were-delivered-by-each-runner?)<br><br>
4. [How many of each type of pizza was delivered?](#A4.-How-many-of-each-type-of-pizza-was-delivered?)<br><br>
5. [How many Vegetarian and Meatlovers were ordered by each customer?](#A5.-How-many-Vegetarian-and-Meatlovers-were-ordered-by-each-customer?)<br><br>
6. [What was the maximum number of pizzas delivered in a single order?](#A6.-What-was-the-maximum-number-of-pizzas-delivered-in-a-single-order?)<br><br>
7. [For each customer, how many delivered pizzas had at least 1 change and how many had no changes?](#A7.-For-each-customer,-how-many-delivered-pizzas-had-at-least-1-change-and-how-many-had-no-changes?)<br><br>
8. [How many pizzas were delivered that had both exclusions and extras?](#A8.-How-many-pizzas-were-delivered-that-had-both-exclusions-and-extras?)<br><br>
9. [What was the total volume of pizzas ordered for each hour of the day?](#A9.-What-was-the-total-volume-of-pizzas-ordered-for-each-hour-of-the-day?)<br><br>
10. [What was the volume of orders for each day of the week?](#A10.-What-was-the-volume-of-orders-for-each-day-of-the-week?)<br><br>

[Back to top](#Pizza-Runner-Case-Study:)

## B. Runner and Customer Experience

1. [How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)](#B1.-How-many-runners-signed-up-for-each-1-week-period?)<br><br>
2. [What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?](#B2.-What-was-the-average-time-in-minutes-it-took-for-each-runner-to-arrive-at-the-Pizza-Runner-HQ-to-pickup-the-order?)<br><br>
3. [Is there any relationship between the number of pizzas and how long the order takes to prepare?](#B3.-Is-there-any-relationship-between-the-number-of-pizzas-and-how-long-the-order-takes-to-prepare?)<br><br>
4. [What was the average distance travelled for each customer?](#B4.-What-was-the-average-distance-travelled-for-each-customer?)<br><br>
5. [What was the difference between the longest and shortest delivery times for all orders?](#B5.-What-was-the-difference-between-the-longest-and-shortest-delivery-times-for-all-orders?)<br><br>
6. [What was the average speed for each runner for each delivery and do you notice any trend for these values?](#B6.-What-was-the-average-speed-for-each-runner-for-each-delivery-and-do-you-notice-any-trend-for-these-values?)<br><br>
7. [What is the successful delivery percentage for each runner?](#B7.-What-is-the-successful-delivery-percentage-for-each-runner?)<br><br>

[Back to top](#Pizza-Runner-Case-Study:)

## C. Ingredient Optimisation

1. [What are the standard ingredients for each pizza?](#C1.-What-are-the-standard-ingredients-for-each-pizza?)<br><br>
2. [What was the most commonly added extra?](#C2.-What-was-the-most-commonly-added-extra?)<br><br>
3. [What was the most common exclusion?](#C3.-What-was-the-most-common-exclusion?)<br><br>
4. [Generate an order item for each record in the customers_orders table in certain format](#C4.-Generate-an-order-item-for-each-record-in-the-customers_orders-table-in-the-format-of-one-of-the-following:)<br><br>
5. [Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients](#C5.-Generate-an-alphabetically-ordered-comma-separated-ingredient-list-for-each-pizza-order-from-the-customer_orders-table-and-add-a-2x-in-front-of-any-relevant-ingredients)<br><br>
6. [What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?](#C6.-What-is-the-total-quantity-of-each-ingredient-used-in-all-delivered-pizzas-sorted-by-most-frequent-first?)<br><br>

[Back to top](#Pizza-Runner-Case-Study:)

## D. Pricing and Ratings

1. [If a Meat Lovers pizza costs dollar 12 and Vegetarian costs dollar 10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?](#D1.-If-a-Meat-Lovers-pizza-costs-dollar-12-and-Vegetarian-costs-dollar-10-and-there-were-no-charges-for-changes---how-much-money-has-Pizza-Runner-made-so-far-if-there-are-no-delivery-fees?)<br><br>
2. [What if there was an additional dollar 1 charge for any pizza extras?](#D2.-What-if-there-was-an-additional-dollar-1-charge-for-any-pizza-extras?)<br><br>
3. [The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset - generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.](#D3.-The-Pizza-Runner-team-now-wants-to-add-an-additional-ratings-system-that-allows-customers-to-rate-their-runner,-how-would-you-design-an-additional-table-for-this-new-dataset---generate-a-schema-for-this-new-table-and-insert-your-own-data-for-ratings-for-each-successful-customer-order-between-1-to-5.)<br><br>
4. [Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?](#D4.-Using-your-newly-generated-table---can-you-join-all-of-the-information-together-to-form-a-table-which-has-the-following-information-for-successful-deliveries?)<br><br>
5. [If a Meat Lovers pizza was dollar 12 and Vegetarian dollar 10 fixed prices with no cost for extras and each runner is paid dollar 0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?](#D5.-If-a-Meat-Lovers-pizza-was-dollar-12-and-Vegetarian-dollar-10-fixed-prices-with-no-cost-for-extras-and-each-runner-is-paid-dollar-0.30-per-kilometre-traveled---how-much-money-does-Pizza-Runner-have-left-over-after-these-deliveries?)<br><br>

[Back to top](#Pizza-Runner-Case-Study:)

# A. Pizza Metrics

## A1. How many pizzas were ordered?

In [28]:
%%sql

SELECT Count(pizza_id) AS total_pizza_ordered
FROM   customer_orders;

 * mysql://root:***@localhost:3306/uditdb
1 rows affected.


total_pizza_ordered
14


[Back to top](#Pizza-Runner-Case-Study:)

## A2. How many unique customer orders were made?

In [123]:
%%sql

SELECT Count(DISTINCT( order_id )) AS total_unique_orders
FROM   customer_orders; 

 * mysql://root:***@localhost:3306/uditdb
1 rows affected.


total_unique_orders
10


[Back to top](#Pizza-Runner-Case-Study:)

## A3. How many successful orders were delivered by each runner?

In [124]:
%%sql

SELECT runner_id,
       Count(order_id) AS successful_orders
FROM   runner_orders
WHERE  cancellation = "Not Cancelled"
GROUP  BY 1;

 * mysql://root:***@localhost:3306/uditdb
3 rows affected.


runner_id,successful_orders
1,4
2,3
3,1


[Back to top](#Pizza-Runner-Case-Study:)

## A4. How many of each type of pizza was delivered?

In [89]:
%%sql

SELECT p.pizza_name,
       Count(c.pizza_id) AS Pizza_Delivered
FROM   runner_orders r
       JOIN customer_orders c
         ON r.order_id = c.order_id
       JOIN pizza_names p
         ON c.pizza_id = p.pizza_id
WHERE  cancellation = "Not Cancelled"
GROUP  BY 1

 * mysql://root:***@localhost:3306/uditdb
2 rows affected.


pizza_name,Pizza_Delivered
Meatlovers,9
Vegetarian,3


[Back to top](#Pizza-Runner-Case-Study:)

## A5. How many Vegetarian and Meatlovers were ordered by each customer?

We need to include cancelled as well since we need the ordered pizzas (does not matter if cancelled or not)

In [125]:
%%sql

SELECT c.customer_id,
       p.pizza_name,
       Count(c.pizza_id) AS pizza_ordered
FROM   runner_orders r
       JOIN customer_orders c
         ON r.order_id = c.order_id
       JOIN pizza_names p
         ON c.pizza_id = p.pizza_id
GROUP  BY 1,
          2
ORDER  BY 1

 * mysql://root:***@localhost:3306/uditdb
8 rows affected.


customer_id,pizza_name,pizza_ordered
101,Meatlovers,2
101,Vegetarian,1
102,Meatlovers,2
102,Vegetarian,1
103,Meatlovers,3
103,Vegetarian,1
104,Meatlovers,3
105,Vegetarian,1


[Back to top](#Pizza-Runner-Case-Study:)

## A6. What was the maximum number of pizzas delivered in a single order?

We need to exclude cancelled now since we need the pizzas that were delivered

In [126]:
%%sql

WITH ranked_orders
     AS (SELECT r.order_id,
                Count(c.pizza_id)                    AS pizza_delivered,
                Rank()
                  OVER(
                    ORDER BY Count(c.pizza_id) DESC) AS ranks
         FROM   runner_orders r
                JOIN customer_orders c
                  ON r.order_id = c.order_id
         WHERE  r.cancellation = "Not Cancelled"
         GROUP  BY 1)
SELECT order_id,
       pizza_delivered
FROM   ranked_orders
WHERE  ranks = 1;

 * mysql://root:***@localhost:3306/uditdb
1 rows affected.


order_id,pizza_delivered
4,3


[Back to top](#Pizza-Runner-Case-Study:)

## A7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?

In [129]:
%%sql

SELECT c.customer_id,
       Count(CASE
               WHEN c.exclusions <> 0
                     OR c.extras <> 0 THEN c.pizza_id
             END) AS delivered_pizzas_with_changes,
       Count(CASE
               WHEN c.exclusions = 0
                    AND c.extras = 0 THEN c.pizza_id
             END) AS delivered_pizzas_with_no_changes
FROM   customer_orders c
       JOIN runner_orders r
         ON c.order_id = r.order_id
WHERE  r.cancellation = "Not Cancelled"
GROUP  BY 1

 * mysql://root:***@localhost:3306/uditdb
5 rows affected.


customer_id,delivered_pizzas_with_changes,delivered_pizzas_with_no_changes
101,0,2
102,0,3
103,3,0
104,2,1
105,1,0


[Back to top](#Pizza-Runner-Case-Study:)

## A8. How many pizzas were delivered that had both exclusions and extras?

In [130]:
%%sql

SELECT Count(c.pizza_id) AS delivered_pizzas_with_changes
FROM   customer_orders c
       JOIN runner_orders r
         ON c.order_id = r.order_id
WHERE  r.cancellation = "not cancelled"
       AND c.exclusions <> 0
       AND c.extras <> 0 

 * mysql://root:***@localhost:3306/uditdb
1 rows affected.


delivered_pizzas_with_changes
1


[Back to top](#Pizza-Runner-Case-Study:)

## A9. What was the total volume of pizzas ordered for each hour of the day?

In [135]:
%%sql 

SELECT Hour(order_time) AS hour_of_the_day,
       Count(pizza_id)  AS pizza_ordered
FROM   customer_orders
GROUP  BY 1
ORDER  BY 1

 * mysql://root:***@localhost:3306/uditdb
6 rows affected.


hour_of_the_day,pizza_ordered
11,1
13,3
18,3
19,1
21,3
23,3


[Back to top](#Pizza-Runner-Case-Study:)

## A10. What was the volume of orders for each day of the week?

In [142]:
%%sql 

SELECT Dayname(order_time) AS day_of_the_week,
       Count(pizza_id)     AS pizza_ordered
FROM   customer_orders
GROUP  BY 1
ORDER  BY 1 

 * mysql://root:***@localhost:3306/uditdb
4 rows affected.


day_of_the_week,pizza_ordered
Friday,1
Saturday,5
Thursday,3
Wednesday,5


[Back to top](#Pizza-Runner-Case-Study:)

# B. Runner and Customer Experience

## B1. How many runners signed up for each 1 week period?

We can not use "WEEK(date)" as it will give the default week numbers. To categorize weeks based on certain start date (here 2021-01-01) we need to explicitly divide the dates into weeks.

In [171]:
%%sql

SELECT Concat("week:", Ceiling(( Datediff(registration_date, '2021-01-01')
                                  + 1 ) / 7)) AS weeks,
       Count(runner_id)                       AS runners_joined
FROM   runners
GROUP  BY 1

 * mysql://root:***@localhost:3306/uditdb
3 rows affected.


weeks,runners_joined
Week: 1,2
Week: 2,1
Week: 3,1


[Back to top](#Pizza-Runner-Case-Study:)

## B2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?

In [35]:
%%sql
SELECT    p.runner_id,
          Round(Avg(Time_to_sec(Timediff(p.pickup_time, c.order_time))/60)) AS avg_minutes_took
FROM      (
                 SELECT order_id,
                        pickup_time,
                        runner_id
                 FROM   runner_orders
                 WHERE  cancellation = "Not Cancelled") p
LEFT JOIN
          (
                   SELECT   order_id,
                            Max(order_time) AS order_time
                   FROM     customer_orders
                   GROUP BY 1) c
ON        p.order_id = c.order_id
GROUP BY  1

 * mysql://root:***@localhost:3306/uditdb
3 rows affected.


runner_id,avg_minutes_took
1,14
2,20
3,10


[Back to top](#Pizza-Runner-Case-Study:)

## B3. Is there any relationship between the number of pizzas and how long the order takes to prepare?

In [43]:
%%sql

SELECT  p.order_id,c.pizza_count,
          round(Time_to_sec(Timediff(p.pickup_time, c.order_time))/60) AS pick_up_time_in_Minutes, 
    round((Time_to_sec(Timediff(p.pickup_time, c.order_time))/60)/c.pizza_count) AS avg_time_per_pizza_in_Minutes
FROM      (
                 SELECT order_id,
                        pickup_time
                 FROM   runner_orders
                 WHERE  cancellation = "Not Cancelled") p
LEFT JOIN
          (
                   SELECT   order_id,
              count(pizza_id) as pizza_count,
                            Max(order_time) AS order_time
                   FROM     customer_orders
                   GROUP BY 1) c
ON        p.order_id = c.order_id
order by 2

 * mysql://root:***@localhost:3306/uditdb
8 rows affected.


order_id,pizza_count,pick_up_time_in_Minutes,avg_time_per_pizza_in_Minutes
1,1,11,11
2,1,10,10
5,1,10,10
7,1,10,10
8,1,20,20
3,2,21,11
10,2,16,8
4,3,29,10


Hense it takes more time if the count of pizza is higher. On an average it takes 10 minutes to prepare one pizza (except one case 20 min).

[Back to top](#Pizza-Runner-Case-Study:)

## B4. What was the average distance travelled for each customer?

In [60]:
%%sql

SELECT c.customer_id,
       Round(Avg(r.distance)) AS avg_dist_travelled
FROM   runner_orders r
       LEFT JOIN customer_orders c
              ON r.order_id = c.order_id
WHERE  r.cancellation = "Not Cancelled"
GROUP  BY 1

 * mysql://root:***@localhost:3306/uditdb
5 rows affected.


customer_id,avg_dist_travelled
101,20.0
102,17.0
103,23.0
104,10.0
105,25.0


[Back to top](#Pizza-Runner-Case-Study:)

## B5. What was the difference between the longest and shortest delivery times for all orders?

In [93]:
%%sql
SELECT Cast(Min(duration) AS UNSIGNED) AS shortest,
       Cast(Max(duration) AS UNSIGNED) AS longest,
       Max(duration) - Min(duration)   AS difference
FROM   runner_orders
WHERE  cancellation = "Not Cancelled"

 * mysql://root:***@localhost:3306/uditdb
1 rows affected.


shortest,longest,difference
10,40,30.0


[Back to top](#Pizza-Runner-Case-Study:)

## B6. What was the average speed for each runner for each delivery and do you notice any trend for these values?

In [109]:
%%sql

SELECT order_id,
       runner_id,
       Round(( distance / duration ) * 60) AS speed_KMPH
FROM   runner_orders
WHERE  cancellation = "not cancelled"
ORDER  BY 1

 * mysql://root:***@localhost:3306/uditdb
8 rows affected.


order_id,runner_id,speed_KMPH
1,1,38.0
2,1,44.0
3,1,40.0
4,2,35.0
5,3,40.0
7,2,60.0
8,2,94.0
10,1,60.0


There is no relation between runners and the average speed. 94 kmph looks way too fast, it could be that the data entered is wrong 13.4 km instead of 23.4 km.

[Back to top](#Pizza-Runner-Case-Study:)

## B7. What is the successful delivery percentage for each runner?

Successful deliveries are where order has not been cancelled.

In [149]:
%%sql

SELECT runner_id,
       Round(( ( Count(CASE
                         WHEN cancellation = "Not Cancelled" THEN order_id
                       END) ) / Count(order_id) ) * 100) AS
       successful_delivery_percentage
FROM   runner_orders
GROUP  BY 1

 * mysql://root:***@localhost:3306/uditdb
3 rows affected.


runner_id,successful_delivery_percentage
1,100
2,75
3,50


[Back to top](#Pizza-Runner-Case-Study:)

# C. Ingredient Optimisation

## C1. What are the standard ingredients for each pizza?

In [6]:
%%sql
SELECT n.pizza_name,
       Group_concat(t.topping_name) as ingredients
FROM   pizza_recipes_mod m
       JOIN pizza_toppings t
         ON m.toppings = t.topping_id
       JOIN pizza_names n
         ON m.pizza_id = n.pizza_id
GROUP  BY 1
ORDER  BY 1

 * mysql://root:***@localhost:3306/uditdb
2 rows affected.


pizza_name,ingredients
Meatlovers,"Bacon,BBQ Sauce,Beef,Cheese,Chicken,Mushrooms,Pepperoni,Salami"
Vegetarian,"Cheese,Mushrooms,Onions,Peppers,Tomatoes,Tomato Sauce"


[Back to top](#Pizza-Runner-Case-Study:)

## C2. What was the most commonly added extra?

In [18]:
%%sql

WITH customer_orders_mod AS
(
       SELECT order_id,
              customer_id,
              pizza_id,
              order_time,
              Substring_index(exclusions, ', ',1) AS exclusions ,
              Substring_index(extras, ', ',1)     AS extras
       FROM   customer_orders
       UNION
       SELECT order_id,
              customer_id,
              pizza_id,
              order_time,
              Substring_index(exclusions, ', ',-1) AS exclusions ,
              Substring_index(extras, ', ',-1)     AS extras
       FROM   customer_orders)


SELECT   t.topping_name,
         Count(cm.extras) AS count_of_pizzas
FROM     customer_orders_mod cm
JOIN     pizza_toppings t
ON       cm.extras = t.topping_id
WHERE    extras <> 0
GROUP BY 1
ORDER BY 2 DESC limit 1

 * mysql://root:***@localhost:3306/uditdb
1 rows affected.


topping_name,count_of_pizzas
Bacon,4


[Back to top](#Pizza-Runner-Case-Study:)

## C3. What was the most common exclusion?

In [17]:
%%sql

 
WITH customer_orders_mod AS
(SELECT order_id,
              customer_id,
              pizza_id,
              order_time,
              Substring_index(exclusions, ', ',1) AS exclusions ,
              Substring_index(extras, ', ',1)     AS extras
       FROM   customer_orders
       UNION
       SELECT order_id,
              customer_id,
              pizza_id,
              order_time,
              Substring_index(exclusions, ', ',-1) AS exclusions ,
              Substring_index(extras, ', ',-1)     AS extras
       FROM   customer_orders)

SELECT   t.topping_name,
         Count(cm.exclusions) AS count_of_pizzas
FROM     customer_orders_mod cm
JOIN     pizza_toppings t
ON       cm.exclusions = t.topping_id
WHERE    exclusions <> 0
GROUP BY 1
ORDER BY 2 DESC limit 1

 * mysql://root:***@localhost:3306/uditdb
1 rows affected.


topping_name,count_of_pizzas
Cheese,4


[Back to top](#Pizza-Runner-Case-Study:)

## C4. Generate an order item for each record in the customers_orders table in the format of one of the following:

- Meat Lovers
- Meat Lovers - Exclude Beef
- Meat Lovers - Extra Bacon
- Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers

In [82]:
%%sql

WITH customer_orders_mod
     AS (SELECT order_id,
                customer_id,
                pizza_id,
                order_time,
                Substring_index(exclusions, ', ', 1) AS exclusions,
                Substring_index(extras, ', ', 1)     AS extras
         FROM   customer_orders
         UNION
         SELECT order_id,
                customer_id,
                pizza_id,
                order_time,
                Substring_index(exclusions, ', ', -1) AS exclusions,
                Substring_index(extras, ', ', -1)     AS extras
         FROM   customer_orders),
     order_info
     AS (SELECT cm.order_id,
                cm.pizza_id,
                t.topping_name  AS exclusions,
                t1.topping_name AS extras,
                nm.pizza_name
         FROM   customer_orders_mod cm
                LEFT JOIN pizza_toppings t
                       ON cm.exclusions = t.topping_id
                LEFT JOIN pizza_toppings t1
                       ON cm.extras = t1.topping_id
                LEFT JOIN pizza_names nm
                       ON cm.pizza_id = nm.pizza_id),
     order_info_grouped
     AS (SELECT order_id,
                pizza_id,
                pizza_name,
                Group_concat(DISTINCT exclusions) AS exclusions,
                Group_concat(DISTINCT extras)     AS extras
         FROM   order_info
         GROUP  BY 1,
                   2,
                   3)
SELECT order_id,
       pizza_id,
       Concat(pizza_name, COALESCE(Concat(' | Exclude= ', exclusions), ""),
       COALESCE(
       Concat(' | Extra= ', extras), "")) AS pizza_name_with_exclusions_extras
FROM   order_info 

 * mysql://root:***@localhost:3306/uditdb
15 rows affected.


order_id,pizza_id,pizza_name_with_exclusions_extras
1,1,Meatlovers
2,1,Meatlovers
3,1,Meatlovers
3,2,Vegetarian
4,1,Meatlovers | Exclude= Cheese
4,2,Vegetarian | Exclude= Cheese
5,1,Meatlovers | Extra= Bacon
6,2,Vegetarian
7,2,Vegetarian | Extra= Bacon
8,1,Meatlovers


[Back to top](#Pizza-Runner-Case-Study:)

## C5. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients

For example: "2xBacon, Beef, ... , Salami"

In [68]:
%%sql

WITH customer_orders_mod
     AS (SELECT Row_number()
                  over()                             AS rn,
                order_id,
                customer_id,
                pizza_id,
                order_time,
                Substring_index(exclusions, ', ', 1) AS exclusions,
                Substring_index(extras, ', ', 1)     AS extras
         FROM   customer_orders
         UNION
         SELECT Row_number()
                  over()                              AS rn,
                order_id,
                customer_id,
                pizza_id,
                order_time,
                Substring_index(exclusions, ', ', -1) AS exclusions,
                Substring_index(extras, ', ', -1)     AS extras
         FROM   customer_orders),
     orders_toppings
     AS (SELECT rn,
                order_id,
                cm.pizza_id,
                exclusions,
                extras,
                toppings
         FROM   customer_orders_mod cm
                join pizza_recipes_mod pm
                  ON cm.pizza_id = pm.pizza_id),
     orders_toppings_stacked
     AS (SELECT DISTINCT rn,
                         order_id,
                         pizza_id,
                         exclusions AS toppings
         FROM   orders_toppings
         UNION ALL
         SELECT DISTINCT rn,
                         order_id,
                         pizza_id,
                         extras AS toppings
         FROM   orders_toppings
         UNION ALL
         SELECT DISTINCT rn,
                         order_id,
                         pizza_id,
                         toppings
         FROM   orders_toppings),
     toppings_count
     AS (SELECT rn,
                order_id,
                pizza_id,
                topping_name,
                Count(toppings) AS count
         FROM   orders_toppings_stacked os
                join pizza_toppings pt
                  ON os.toppings = pt.topping_id
         GROUP  BY 1,2,3,4
         ORDER  BY 5 DESC)
SELECT rn
       AS
       row_num,
       order_id,
       pizza_id,
       Group_concat(IF(count = 2, Concat(count, 'x ', topping_name),
                    topping_name)) AS
       toppings
FROM   toppings_count
GROUP  BY 1,2,3 

 * mysql://root:***@localhost:3306/uditdb
14 rows affected.


row_num,order_id,pizza_id,toppings
1,1,1,"Bacon,BBQ Sauce,Beef,Cheese,Chicken,Mushrooms,Pepperoni,Salami"
2,2,1,"Bacon,BBQ Sauce,Beef,Cheese,Chicken,Mushrooms,Pepperoni,Salami"
3,3,1,"Bacon,BBQ Sauce,Beef,Cheese,Chicken,Mushrooms,Pepperoni,Salami"
4,3,2,"Cheese,Mushrooms,Onions,Peppers,Tomatoes,Tomato Sauce"
5,4,1,"2x Cheese,Bacon,BBQ Sauce,Beef,Chicken,Mushrooms,Pepperoni,Salami"
6,4,1,"2x Cheese,Bacon,BBQ Sauce,Beef,Chicken,Mushrooms,Pepperoni,Salami"
7,4,2,"2x Cheese,Mushrooms,Onions,Peppers,Tomatoes,Tomato Sauce"
8,5,1,"2x Bacon,BBQ Sauce,Beef,Cheese,Chicken,Mushrooms,Pepperoni,Salami"
9,6,2,"Cheese,Mushrooms,Onions,Peppers,Tomatoes,Tomato Sauce"
10,7,2,"Bacon,Cheese,Mushrooms,Onions,Peppers,Tomatoes,Tomato Sauce"


[Back to top](#Pizza-Runner-Case-Study:)

## C6. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?

In [12]:
%%sql

WITH customer_orders_mod
     AS (SELECT Row_number()
                  OVER()                             AS rn,
                order_id,
                customer_id,
                pizza_id,
                order_time,
                Substring_index(exclusions, ', ', 1) AS exclusions,
                Substring_index(extras, ', ', 1)     AS extras
         FROM   customer_orders
         UNION
         SELECT Row_number()
                  OVER()                              AS rn,
                order_id,
                customer_id,
                pizza_id,
                order_time,
                Substring_index(exclusions, ', ', -1) AS exclusions,
                Substring_index(extras, ', ', -1)     AS extras
         FROM   customer_orders),
     orders_toppings
     AS (SELECT rn,
                order_id,
                cm.pizza_id,
                exclusions,
                extras,
                toppings
         FROM   customer_orders_mod cm
                JOIN pizza_recipes_mod pm
                  ON cm.pizza_id = pm.pizza_id),
     orders_toppings_stacked
     AS (SELECT DISTINCT rn,
                         order_id,
                         pizza_id,
                         exclusions AS toppings
         FROM   orders_toppings
         UNION ALL
         SELECT DISTINCT rn,
                         order_id,
                         pizza_id,
                         extras AS toppings
         FROM   orders_toppings
         UNION ALL
         SELECT DISTINCT rn,
                         order_id,
                         pizza_id,
                         toppings
         FROM   orders_toppings),
     toppings_count
     AS (SELECT rn,
                order_id,
                pizza_id,
                topping_name,
                Count(toppings) AS count
         FROM   orders_toppings_stacked os
                JOIN pizza_toppings pt
                  ON os.toppings = pt.topping_id
         GROUP  BY 1,2,3,4
         ORDER  BY 5 DESC)
SELECT topping_name,
       Sum(count) AS count
FROM   toppings_count
WHERE  order_id IN (SELECT order_id
                    FROM   runner_orders
                    WHERE  cancellation = "Not Cancelled")
GROUP  BY 1
ORDER  BY 2 DESC 

 * mysql://root:***@localhost:3306/uditdb
12 rows affected.


topping_name,count
Cheese,16
Mushrooms,13
Bacon,12
BBQ Sauce,10
Beef,9
Chicken,9
Pepperoni,9
Salami,9
Onions,3
Peppers,3


[Back to top](#Pizza-Runner-Case-Study:)

# D. Pricing and Ratings

## D1. If a Meat Lovers pizza costs dollar 12 and Vegetarian costs dollar 10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?

In [8]:
%%sql

WITH sales
     AS (SELECT pizza_name,
                CASE
                  WHEN pizza_name = 'Meatlovers' THEN Count(ord.pizza_id) * 12
                  ELSE Count(ord.pizza_id) * 10
                END AS amount
         FROM   customer_orders ord
                JOIN pizza_names pn
                  ON ord.pizza_id = pn.pizza_id
         WHERE  order_id IN (SELECT order_id
                             FROM   runner_orders
                             WHERE  cancellation = "Not Cancelled")
         GROUP  BY 1)
SELECT Sum(amount) AS total
FROM   sales

 * mysql://root:***@localhost:3306/uditdb
1 rows affected.


total
138


[Back to top](#Pizza-Runner-Case-Study:)

## D2. What if there was an additional dollar 1 charge for any pizza extras?

In [14]:
%%sql

WITH sales
     AS (SELECT pizza_name,
                CASE WHEN pizza_name = 'Meatlovers' THEN Count(ord.pizza_id) * 12
                ELSE
                Count(ord.pizza_id) * 10 END
                + Sum(If(extras = 0, 0, ( Length(extras) - Length( Replace(extras,",","")) + 1) * 1)) 
                 AS amount
         FROM   customer_orders ord
                JOIN pizza_names pn
                  ON ord.pizza_id = pn.pizza_id
         WHERE  order_id IN (SELECT order_id
                             FROM   runner_orders
                             WHERE  cancellation = "Not Cancelled")
         GROUP  BY 1)
SELECT Sum(amount) AS total
FROM   sales 

 * mysql://root:***@localhost:3306/uditdb
1 rows affected.


total
142


[Back to top](#Pizza-Runner-Case-Study:)

## D3. The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset - generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.

In [33]:
%%sql

use uditdb;
DROP TABLE IF EXISTS runners_rating;

CREATE TABLE runners_rating
  (
     id          SERIAL PRIMARY KEY,
     order_id    INT,
     customer_id INT,
     runner_id   INT,
     rating      INT,
     rating_time DATETIME
  );

INSERT INTO runners_rating
            (order_id,
             customer_id,
             runner_id,
             rating,
             rating_time)
VALUES
  ('1', '101', '1', '4', '2020-01-02 21:54:51'),
  ('2', '101', '1', '3', '2020-01-04 23:06:03'),
  ('3', '102', '1', '4', '2020-01-03 15:12:06'),
  ('4', '103', '2', '5', '2020-01-05 19:47:06'),
  ('5', '104', '3', '5', '2020-01-09 23:33:27'),
  ('7', '105', '2', '3', '2020-01-10 23:57:12'),
  ('8', '102', '2', '1', '2020-01-10 12:30:45'),
  ('10', '104', '1', '5', '2020-01-13 20:05:35');

 * mysql://root:***@localhost:3306/uditdb
0 rows affected.
0 rows affected.
0 rows affected.
8 rows affected.


[]

In [34]:
%%sql

select * from runners_rating

 * mysql://root:***@localhost:3306/uditdb
8 rows affected.


id,order_id,customer_id,runner_id,rating,rating_time
1,1,101,1,4,2020-01-02 21:54:51
2,2,101,1,3,2020-01-04 23:06:03
3,3,102,1,4,2020-01-03 15:12:06
4,4,103,2,5,2020-01-05 19:47:06
5,5,104,3,5,2020-01-09 23:33:27
6,7,105,2,3,2020-01-10 23:57:12
7,8,102,2,1,2020-01-10 12:30:45
8,10,104,1,5,2020-01-13 20:05:35


[Back to top](#Pizza-Runner-Case-Study:)

## D4. Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?

- customer_id
- order_id
- runner_id
- rating
- order_time
- pickup_time
- Time between order and pickup
- Delivery duration
- Average speed
- Total number of pizzas

In [43]:
%%sql

SELECT rt.customer_id,
       rt.order_id,
       rt.runner_id,
       rt.rating                                            AS
       'rating (out of 5)',
       ro.pickup_time,
       Timestampdiff(minute, co.order_time, ro.pickup_time) AS
       'Time b/w order and pickup (in min)',
       Round(ro.duration)                                   AS
       'Delivery duration (in min)',
       Round(( distance * 60 ) / duration)                  AS
       'Average speed (in kmph)',
       count(co.pizza_id)                                     AS
       'Total number of pizzas'
FROM   runners_rating rt
       LEFT JOIN runner_orders ro
              ON rt.order_id = ro.order_id
       LEFT JOIN customer_orders co
              ON rt.order_id = co.order_id
GROUP  BY 1,2,3,4,5,6,7,8 

 * mysql://root:***@localhost:3306/uditdb
8 rows affected.


customer_id,order_id,runner_id,rating (out of 5),pickup_time,Time b/w order and pickup (in min),Delivery duration (in min),Average speed (in kmph),Total number of pizzas
101,1,1,4,2020-01-01 18:15:34,10,32.0,38.0,1
101,2,1,3,2020-01-01 19:10:54,10,27.0,44.0,1
102,3,1,4,2020-01-03 00:12:37,21,20.0,40.0,2
103,4,2,5,2020-01-04 13:53:03,29,40.0,35.0,3
104,5,3,5,2020-01-08 21:10:57,10,15.0,40.0,1
105,7,2,3,2020-01-08 21:30:45,10,25.0,60.0,1
102,8,2,1,2020-01-10 00:15:02,20,15.0,94.0,1
104,10,1,5,2020-01-11 18:50:20,15,10.0,60.0,2


[Back to top](#Pizza-Runner-Case-Study:)

## D5. If a Meat Lovers pizza was dollar 12 and Vegetarian dollar 10 fixed prices with no cost for extras and each runner is paid dollar 0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?

In [92]:
%%sql

WITH travel_charge
     AS (SELECT Sum(distance * 0.3) AS travel_charge
         FROM   runner_orders
         WHERE  cancellation = "not cancelled"),
     pizza_cost
     AS (SELECT Sum(CASE
                      WHEN pn.pizza_name = 'Meatlovers' THEN 12
                      ELSE 10
                    END) AS pizza_cost
         FROM   customer_orders co
                LEFT JOIN runner_orders AS ro
                       ON co.order_id = ro.order_id
                LEFT JOIN pizza_names pn
                       ON co.pizza_id = pn.pizza_id
         WHERE  cancellation = "not cancelled")
SELECT pizza_cost - travel_charge AS profit
FROM   pizza_cost p,
       travel_charge t

 * mysql://root:***@localhost:3306/uditdb
1 rows affected.


profit
94.44


[Back to top](#Pizza-Runner-Case-Study:)