## Case Study #2 - Pizza Runner

#### Problem Statement
Did you know that over 115 million kilograms of pizza is consumed daily worldwide??? (Well according to Wikipedia anyway…)

Danny was scrolling through his Instagram feed when something really caught his eye - “80s Retro Styling and Pizza Is The Future!”

Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire - so he had one more genius idea to combine with it - he was going to Uberize it - and so Pizza Runner was launched!

Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.

#### Entity Relationship Diagram

![week2.png](week2.png)

Import modules

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sqlite3 as sql
pd.set_option('display.max_columns', None)

Initialize SQL

In [2]:
conn = sql.connect("week2.db")
cursor = conn.cursor() 
if os.stat("week2.db").st_size == 0:
    with open('week2-sql.txt','r') as file:
        script = file.read()
        script = script.replace('\n', ' ')
    cursor.executescript(script)

Verify tables

In [3]:
query = """SELECT name FROM sqlite_master WHERE type='table';"""
cursor.execute(query)
tables = [table[0] for table in cursor.fetchall()]
tables
print(f'The tables in the database are: {', '.join(tables)}')

The tables in the database are: runners, customer_orders, runner_orders, pizza_names, pizza_recipes, pizza_toppings


Fetch table information

In [4]:
for table in tables:
    print("=================================")
    print(f'Table [{table}]')
    df = pd.read_sql_query(f'SELECT * FROM {table}', conn)
    print(f'Dimensions: {df.shape[0]} rows x {df.shape[1]} columns\n')
    print(df.head())
    info_df = pd.DataFrame.from_dict({'Datatypes':df.dtypes, 'NULL count':df.isna().sum()})
    print()
    print(info_df)
    print()

Table [runners]
Dimensions: 4 rows x 2 columns

   runner_id registration_date
0          1        2021-01-01
1          2        2021-01-03
2          3        2021-01-08
3          4        2021-01-15

                  Datatypes  NULL count
runner_id             int64           0
registration_date    object           0

Table [customer_orders]
Dimensions: 14 rows x 6 columns

   order_id  customer_id  pizza_id exclusions extras           order_time
0         1          101         1                    2020-01-01 18:05:02
1         2          101         1                    2020-01-01 19:00:52
2         3          102         1                    2020-01-02 23:51:23
3         3          102         2              None  2020-01-02 23:51:23
4         4          103         1          4         2020-01-04 13:23:46

            Datatypes  NULL count
order_id        int64           0
customer_id     int64           0
pizza_id        int64           0
exclusions     object           0
ext

In [5]:
def query(stmt: str):
    """Executes a given SQL statement and returns a Pandas DataFrame given the results.
    
    Parameters
    ----------
    stmt: str
        The SQL statement to be executed
    """
    global conn
    result = pd.read_sql_query(stmt, conn)
    return result

## Case Study Questions

**A. Data Cleaning**

Q1: Investigate your data and do the necessary data adjustments and cleaning.

In [6]:
# Check customer_orders table
query("SELECT * FROM customer_orders")

Unnamed: 0,order_id,customer_id,pizza_id,exclusions,extras,order_time
0,1,101,1,,,2020-01-01 18:05:02
1,2,101,1,,,2020-01-01 19:00:52
2,3,102,1,,,2020-01-02 23:51:23
3,3,102,2,,,2020-01-02 23:51:23
4,4,103,1,4,,2020-01-04 13:23:46
5,4,103,1,4,,2020-01-04 13:23:46
6,4,103,2,4,,2020-01-04 13:23:46
7,5,104,1,,1,2020-01-08 21:00:29
8,6,101,2,,,2020-01-08 21:03:13
9,7,105,2,,1,2020-01-08 21:20:29


Note: There are blanks '' and 'null's in exclusions and extras columns. We need to unify them to nulls. We can use a CASE statement

In [7]:
script = """
    DROP TABLE IF EXISTS customer_orders_clean;
    CREATE TEMP TABLE customer_orders_clean AS
    SELECT
        order_id,
        customer_id,
        pizza_id,
        CASE
            WHEN exclusions IS NULL OR exclusions = "" OR exclusions LIKE 'null' THEN NULL
            ELSE exclusions
            END AS exclusions,
        CASE
            WHEN extras IS NULL OR extras = "" OR extras LIKE 'null' THEN NULL
            ELSE extras
            END AS extras,
            order_time
        FROM customer_orders;
"""
cursor.executescript(script)
# Verify result
query("SELECT * FROM customer_orders_clean")

Unnamed: 0,order_id,customer_id,pizza_id,exclusions,extras,order_time
0,1,101,1,,,2020-01-01 18:05:02
1,2,101,1,,,2020-01-01 19:00:52
2,3,102,1,,,2020-01-02 23:51:23
3,3,102,2,,,2020-01-02 23:51:23
4,4,103,1,4,,2020-01-04 13:23:46
5,4,103,1,4,,2020-01-04 13:23:46
6,4,103,2,4,,2020-01-04 13:23:46
7,5,104,1,,1,2020-01-08 21:00:29
8,6,101,2,,,2020-01-08 21:03:13
9,7,105,2,,1,2020-01-08 21:20:29


In [8]:
# Check runner_orders table
query("SELECT * FROM runner_orders")

Unnamed: 0,order_id,runner_id,pickup_time,distance,duration,cancellation
0,1,1,2020-01-01 18:15:34,20km,32 minutes,
1,2,1,2020-01-01 19:10:54,20km,27 minutes,
2,3,1,2020-01-03 00:12:37,13.4km,20 mins,
3,4,2,2020-01-04 13:53:03,23.4,40,
4,5,3,2020-01-08 21:10:57,10,15,
5,6,3,,,,Restaurant Cancellation
6,7,2,2020-01-08 21:30:45,25km,25mins,
7,8,2,2020-01-10 00:15:02,23.4 km,15 minute,
8,9,2,,,,Customer Cancellation
9,10,1,2020-01-11 18:50:20,10km,10minutes,


Inconsistencies found in columns pickup_time ('null'), distance ('null', 'km'), duration ('minutes', 'minute', 'mins', 'null'), cancellation ('null','')

In [9]:
script = """
    DROP TABLE IF EXISTS runner_orders_clean;
    CREATE TEMP TABLE runner_orders_clean AS
    SELECT 
    order_id, 
    runner_id,  
    CASE
        WHEN pickup_time LIKE 'null' THEN NULL
        ELSE pickup_time
        END AS pickup_time,
    CASE
        WHEN distance LIKE 'null' THEN NULL
        WHEN distance LIKE '%km' THEN TRIM(distance, 'km')
        ELSE distance 
        END AS distance,
    CASE
        WHEN duration LIKE 'null' THEN NULL
        WHEN duration LIKE '%mins' THEN TRIM(duration, 'mins')
        WHEN duration LIKE '%minute' THEN TRIM(duration, 'minute')
        WHEN duration LIKE '%minutes' THEN TRIM(duration, 'minutes')
        ELSE duration
        END AS duration,
    CASE
        WHEN cancellation = "" OR cancellation LIKE 'null' THEN NULL
        ELSE cancellation
        END AS cancellation
    FROM runner_orders
"""
cursor.executescript(script)
# Verify result
query("SELECT * FROM runner_orders_clean")

Unnamed: 0,order_id,runner_id,pickup_time,distance,duration,cancellation
0,1,1,2020-01-01 18:15:34,20.0,32.0,
1,2,1,2020-01-01 19:10:54,20.0,27.0,
2,3,1,2020-01-03 00:12:37,13.4,20.0,
3,4,2,2020-01-04 13:53:03,23.4,40.0,
4,5,3,2020-01-08 21:10:57,10.0,15.0,
5,6,3,,,,Restaurant Cancellation
6,7,2,2020-01-08 21:30:45,25.0,25.0,
7,8,2,2020-01-10 00:15:02,23.4,15.0,
8,9,2,,,,Customer Cancellation
9,10,1,2020-01-11 18:50:20,10.0,10.0,


Changing the data types of pickup_time, distance, and duration to their correct numeric types instead of string

In [10]:
script = """
    PRAGMA writable_schema = 1; 
    UPDATE SQLITE_MASTER 
    SET SQL = 
        'CREATE TEMP TABLE runner_orders_clean (
            order_id INT NOT NULL, 
            runner_id INT NOT NULL,
            pickup_time DATETIME,
            distance FLOAT,
            duration INT,
            cancellation VARCHAR
         )' 
    WHERE NAME = 'runner_orders_clean';
    PRAGMA writable_schema = 0;
"""
cursor.executescript(script)

<sqlite3.Cursor at 0x2265db202c0>

In [11]:
query("SELECT * FROM runner_orders_clean")

Unnamed: 0,order_id,runner_id,pickup_time,distance,duration,cancellation
0,1,1,2020-01-01 18:15:34,20.0,32.0,
1,2,1,2020-01-01 19:10:54,20.0,27.0,
2,3,1,2020-01-03 00:12:37,13.4,20.0,
3,4,2,2020-01-04 13:53:03,23.4,40.0,
4,5,3,2020-01-08 21:10:57,10.0,15.0,
5,6,3,,,,Restaurant Cancellation
6,7,2,2020-01-08 21:30:45,25.0,25.0,
7,8,2,2020-01-10 00:15:02,23.4,15.0,
8,9,2,,,,Customer Cancellation
9,10,1,2020-01-11 18:50:20,10.0,10.0,


**B. Pizza Metrics**

Q2: How many pizzas were ordered?

In [12]:
query("""
    SELECT COUNT(*) as pizza_order_count
    FROM customer_orders_clean
""")

Unnamed: 0,pizza_order_count
0,14


Q3: How many unique customer orders were made?


In [13]:
query("""
    SELECT COUNT(DISTINCT order_id) as unique_customer_orders
    FROM customer_orders_clean
""")

Unnamed: 0,unique_customer_orders
0,10


Q4: How many successful orders were delivered by each runner?


In [14]:
query("""
    SELECT 
        runner_id, 
        COUNT(order_id) AS successful_orders
    FROM runner_orders_clean
    WHERE distance <> 0
    GROUP BY runner_id;
""")

Unnamed: 0,runner_id,successful_orders
0,1,4
1,2,3
2,3,1


Q5: How many of each type of pizza was delivered?


In [15]:
query("""
    SELECT 
        p.pizza_name, 
        COUNT(c.pizza_id) AS delivered_pizza_count
    FROM customer_orders_clean AS c
    JOIN runner_orders_clean AS r
        ON c.order_id = r.order_id
    JOIN pizza_names AS p
        ON c.pizza_id = p.pizza_id
    WHERE r.distance <> 0
    GROUP BY p.pizza_name;
""")

Unnamed: 0,pizza_name,delivered_pizza_count
0,Meatlovers,9
1,Vegetarian,3


Q6: How many Vegetarian and Meatlovers were ordered by each customer?


In [16]:
query("""
    SELECT
        co.customer_id, pn.pizza_name, COUNT(*) as num_orders
    FROM
        customer_orders co
    INNER JOIN pizza_names pn
        ON co.pizza_id = pn.pizza_id
    GROUP BY co.customer_id, pn.pizza_name
""")

Unnamed: 0,customer_id,pizza_name,num_orders
0,101,Meatlovers,2
1,101,Vegetarian,1
2,102,Meatlovers,2
3,102,Vegetarian,1
4,103,Meatlovers,3
5,103,Vegetarian,1
6,104,Meatlovers,3
7,105,Vegetarian,1


Q7: What was the maximum number of pizzas delivered in a single order?


In [17]:
query("""
    WITH order_counts AS (
        SELECT 
            order_id, COUNT(*) as pizza_count
        FROM
            customer_orders
        GROUP BY order_id
    )
      
    SELECT
        MAX(pizza_count) AS max_pizza_count
    FROM
        order_counts      
""")

Unnamed: 0,max_pizza_count
0,3


Q8: For each customer, how many delivered pizzas had at least 1 change and how many had no changes?


Note: Here, a change refers to a pizza order with exclusions or extras involved.

In [18]:
query("""
    SELECT 
        co.customer_id,
        SUM(CASE
            WHEN co.exclusions IS NOT NULL OR co.extras IS NOT NULL THEN 1 
            ELSE 0 END) AS total_with_changes,
        SUM(CASE
            WHEN co.exclusions IS NULL AND co.extras IS NULL THEN 1 
            ELSE 0 END) AS total_no_changes
    FROM
        customer_orders_clean co
    GROUP BY co.customer_id
""")

Unnamed: 0,customer_id,total_with_changes,total_no_changes
0,101,0,3
1,102,0,3
2,103,4,0
3,104,2,1
4,105,1,0


Q9: How many pizzas were delivered that had both exclusions and extras?


In [19]:
query("""
    SELECT
        SUM(CASE
            WHEN exclusions IS NOT NULL AND extras IS NOT NULL THEN 1 ELSE 0 
        END) AS total_with_both_exclusions_and_extras
    FROM
        customer_orders_clean    
""")

Unnamed: 0,total_with_both_exclusions_and_extras
0,2


Q10: What was the total volume of pizzas ordered for each hour of the day?


In [20]:
query("""
    SELECT
        strftime('%H', order_time) AS order_hour,
        COUNT(*) AS total_pizzas
    FROM
        customer_orders_clean
    GROUP BY
        order_hour
""")

Unnamed: 0,order_hour,total_pizzas
0,11,1
1,13,3
2,18,3
3,19,1
4,21,3
5,23,3


Q11: What was the volume of orders for each day of the week?


In [21]:
query("""
    SELECT
        SUBSTR('SunMonTueWedThuFriSat', 1 + 3*strftime('%w', order_time), 3) AS order_day,
        COUNT(*) AS total_pizzas
    FROM
        customer_orders_clean
    GROUP BY order_day
    ORDER BY total_pizzas DESC
""")

Unnamed: 0,order_day,total_pizzas
0,Wed,5
1,Sat,5
2,Thu,3
3,Fri,1


**B. Runner and Customer Experience**

Q12: How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)


In [22]:
query("""
    SELECT 
        strftime('%W', registration_date) as week_number,
        COUNT(*) as signup_count
    FROM
        runners
    GROUP BY
        week_number
""")

Unnamed: 0,week_number,signup_count
0,0,2
1,1,1
2,2,1


Q13: What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?

In [23]:
query("""
    SELECT 
        ro.runner_id,
        AVG((JULIANDAY(ro.pickup_time) - JULIANDAY(co.order_time)) * 24 * 60) AS average_time
    FROM
        customer_orders_clean co
        INNER JOIN runner_orders_clean ro
            ON co.order_id = ro.order_id
    WHERE
        ro.distance IS NOT NULL
    GROUP BY
        ro.runner_id
""")

Unnamed: 0,runner_id,average_time
0,1,15.677778
1,2,23.72
2,3,10.466667


Q14: Is there any relationship between the number of pizzas and how long the order takes to prepare?

In [24]:
query("""
    WITH prep_times AS(
        SELECT 
            co.order_id,
            COUNT(*) as pizza_count,
            (JULIANDAY(ro.pickup_time) - JULIANDAY(co.order_time)) * 24 * 60 AS prep_time
        FROM
            customer_orders_clean co
            INNER JOIN runner_orders_clean ro
                ON co.order_id = ro.order_id
        WHERE
            ro.distance IS NOT NULL
        GROUP BY
            co.order_id
    )

    SELECT 
        pizza_count,
        AVG(prep_time) AS avg_prep_time,
        AVG(prep_time) / pizza_count AS avg_prep_time_per_pizza 
    FROM
        prep_times
    GROUP BY
        pizza_count
""")

Unnamed: 0,pizza_count,avg_prep_time,avg_prep_time_per_pizza
0,1,12.356667,12.356667
1,2,18.375,9.1875
2,3,29.283333,9.761111


Observation: 
- There is a positive correlation between the number of pizzas in an order and the preparation time for the order. The average preparation time per pizza is shortest for orders with 2 pizzas, and longest for orders with 1 pizza.

Q15: What was the average distance travelled for each customer?


In [25]:
query("""
    SELECT
        co.customer_id,
        AVG(ro.distance)
    FROM
        customer_orders_clean co
        INNER JOIN runner_orders_clean ro
            ON co.order_id = ro.order_id
    WHERE
        ro.distance IS NOT NULL
    GROUP BY
        co.customer_id
""")

Unnamed: 0,customer_id,AVG(ro.distance)
0,101,20.0
1,102,16.733333
2,103,23.4
3,104,10.0
4,105,25.0


Q16: What was the difference between the longest and shortest delivery times for all orders?


In [26]:
query("""
    SELECT
        MAX(duration) AS longest_time, 
        MIN(duration) AS shortest_time,
        MAX(duration) - MIN(duration) AS 'difference (mins)'
    FROM
        runner_orders_clean
    WHERE
        duration IS NOT NULL
""")

Unnamed: 0,longest_time,shortest_time,difference (mins)
0,40,10,30


Q17: What was the average speed for each runner for each delivery and do you notice any trend for these values?


In [27]:
query("""
    SELECT 
        ro.runner_id,
        co.order_id,
        COUNT(co.order_id) as pizza_count,
        ro.distance AS 'distance (km)',
        ROUND(ro.duration / 60.0, 2) AS 'duration (hr)',
        ROUND(ro.distance / (ro.duration / 60.0), 2) AS avg_speed_kph
    FROM
        customer_orders_clean co
        INNER JOIN runner_orders_clean ro
            ON co.order_id = ro.order_id
    WHERE
        ro.duration IS NOT NULL
    GROUP BY ro.runner_id, ro.order_id
""")

Unnamed: 0,runner_id,order_id,pizza_count,distance (km),duration (hr),avg_speed_kph
0,1,1,1,20.0,0.53,37.5
1,1,2,1,20.0,0.45,44.44
2,1,3,2,13.4,0.33,40.2
3,1,10,2,10.0,0.17,60.0
4,2,4,3,23.4,0.67,35.1
5,2,7,1,25.0,0.42,60.0
6,2,8,1,23.4,0.25,93.6
7,3,5,1,10.0,0.25,40.0


In [28]:
# Analysis of delivery speeds per runner
query("""
    SELECT
        runner_id,
        COUNT(*) AS delivery_count,
        ROUND(AVG(ro.distance / (ro.duration / 60.0)), 2) AS avg_speed,
        ROUND(MAX(ro.distance / (ro.duration / 60.0) ) - MIN(ro.distance / (ro.duration / 60.0)) , 2)
            AS range_speed
    FROM
        runner_orders_clean ro
    WHERE
        ro.distance IS NOT NULL
    GROUP BY
        runner_id
""")

Unnamed: 0,runner_id,delivery_count,avg_speed,range_speed
0,1,4,45.54,22.5
1,2,3,62.9,58.5
2,3,1,40.0,0.0


Observations: 
- Runners 1 and 3 have a similar average speed (40-45 kph), which is also similar to the majority of delivery times in the runner_orders table. 
- Runner 2 have a very high variability in his delivery times and also obtained the highest average delivery time, suggesting a performance inefficiency in his role as a pizza runner.

Q18: What is the successful delivery percentage for each runner?


In [29]:
# Include runners with no delivery -> "No data"
query("""
    SELECT
        r.runner_id,
        ROUND(100 * SUM(
            CASE WHEN ro.distance IS NOT NULL THEN 1 ELSE 0 END
        ) / COUNT(*), 2) AS success_percentage
    FROM
        runners r
        JOIN runner_orders_clean ro
            ON r.runner_id = ro.runner_id
    GROUP BY r.runner_id
    UNION
    SELECT
        runner_id,
        'No data' AS success_percentage
    FROM
        runners r
    WHERE
        runner_id NOT IN 
            (SELECT DISTINCT runner_id from runner_orders_clean)
""")

Unnamed: 0,runner_id,success_percentage
0,1,100.0
1,2,75.0
2,3,50.0
3,4,No data


**C. Ingredient Optimization**

Q19: What are the standard ingredients for each pizza?


In [30]:
query("""
    WITH RECURSIVE split(pizza_id, toppings, str) AS (
        SELECT pizza_id, '', toppings||',' FROM pizza_recipes
        UNION ALL SELECT
        pizza_id,
        substr(str, 0, instr(str, ',')),
        substr(str, instr(str, ',')+1)
        FROM split WHERE str<>''
    ) 
    SELECT 
        pn.pizza_name, pt.topping_name
    FROM 
        split
        INNER JOIN pizza_names pn ON split.pizza_id = pn.pizza_id
        INNER JOIN pizza_toppings pt ON split.toppings = pt.topping_id
    WHERE 
        split.toppings <> ''
    ORDER BY
        pn.pizza_id, pt.topping_id
""")

Unnamed: 0,pizza_name,topping_name
0,Meatlovers,Bacon
1,Meatlovers,BBQ Sauce
2,Meatlovers,Beef
3,Meatlovers,Cheese
4,Meatlovers,Chicken
5,Meatlovers,Mushrooms
6,Meatlovers,Pepperoni
7,Meatlovers,Salami
8,Vegetarian,Cheese
9,Vegetarian,Mushrooms


Q20: What was the most commonly added extra?

In [31]:
query("""
    WITH RECURSIVE split(order_id, extras, str) AS (
        SELECT order_id, '', extras||',' FROM customer_orders_clean
        UNION ALL SELECT
        order_id,
        substr(str, 0, instr(str, ',')),
        substr(str, instr(str, ',')+1)
        FROM split WHERE str <> ''
    ) 
    SELECT 
        pt.topping_name, COUNT(*) as count
    FROM 
        split
        INNER JOIN pizza_toppings pt ON split.extras = pt.topping_id
    WHERE 
        split.extras <> ''
    GROUP BY pt.topping_name
    ORDER BY count DESC
    LIMIT 1
""")

Unnamed: 0,topping_name,count
0,Bacon,4


Q21: What was the most common exclusion?


In [32]:
query("""
    WITH RECURSIVE split(order_id, exclusions, str) AS (
        SELECT order_id, '', exclusions||',' FROM customer_orders_clean
        UNION ALL SELECT
        order_id,
        substr(str, 0, instr(str, ',')),
        substr(str, instr(str, ',')+1)
        FROM split WHERE str <> ''
    ) 
    SELECT 
        pt.topping_name, COUNT(*) as count
    FROM 
        split
        INNER JOIN pizza_toppings pt ON split.exclusions = pt.topping_id
    WHERE 
        split.exclusions <> ''
    GROUP BY pt.topping_name
    ORDER BY count DESC
    LIMIT 1
""")

Unnamed: 0,topping_name,count
0,Cheese,4


Q22: What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?

In [33]:
# Create the tables containing the splitted toppings from the pizza_receipes table, 
# exclusions and extras columns of the customer_orders_clean table.
cursor.executescript("""
    CREATE TEMP TABLE temp1 AS
    WITH RECURSIVE split(pizza_id, toppings, str) AS (
        SELECT pizza_id, '', toppings||',' FROM pizza_recipes
        UNION ALL SELECT
        pizza_id,
        substr(str, 0, instr(str, ',')),
        substr(str, instr(str, ',')+1)
        FROM split WHERE str<>''
    )
    SELECT * FROM split;
    
    CREATE TEMP TABLE temp2 AS
    WITH RECURSIVE split_extras(order_id, extras, str) AS (
        SELECT order_id, '', extras||',' FROM customer_orders_clean
        UNION ALL SELECT
        order_id,
        substr(str, 0, instr(str, ',')),
        substr(str, instr(str, ',')+1)
        FROM split_extras WHERE str<>''
    )
    SELECT * FROM split_extras;
      
    CREATE TEMP TABLE temp3 AS
    WITH RECURSIVE split_exclusions(order_id, exclusions, str) AS (
        SELECT order_id, '', exclusions||',' FROM customer_orders_clean
        UNION ALL SELECT
        order_id,
        substr(str, 0, instr(str, ',')),
        substr(str, instr(str, ',')+1)
        FROM split_exclusions WHERE str<>''
    )
    SELECT * FROM split_exclusions;
""")  

<sqlite3.Cursor at 0x2265db202c0>

In [34]:
# Utilize the temp tables and join appropriately with UNION ALL (include duplicates)
# and EXCEPT (for exclusions)
query("""
    WITH orders_with_toppings AS (
        SELECT 
            co.order_id, s.toppings AS topping_id
        FROM 
            temp1 s
            INNER JOIN customer_orders_clean co
            ON s.pizza_id = co.pizza_id
        WHERE 
            s.toppings <> ''
      
        UNION ALL
        
        SELECT
            se.order_id, se.extras AS topping_id
        FROM
            temp2 se
        WHERE
            se.extras <> ''
      
        EXCEPT
      
        SELECT
            sx.order_id, sx.exclusions AS topping_id
        FROM
            temp3 sx
        WHERE
            sx.exclusions <> '' 
    )
      
    SELECT
        pt.topping_name, COUNT(*) as total_quantity
    FROM
        orders_with_toppings owt
        INNER JOIN pizza_toppings pt
            ON owt.topping_id = pt.topping_id
    GROUP BY pt.topping_id
    ORDER BY total_quantity DESC
""")

Unnamed: 0,topping_name,total_quantity
0,Cheese,11
1,Mushrooms,9
2,Bacon,9
3,Salami,8
4,Pepperoni,8
5,Chicken,8
6,Beef,8
7,BBQ Sauce,8
8,Tomato Sauce,4
9,Tomatoes,4


**D. Pricing and Ratings**

Q23: If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?


In [35]:
query("""
    SELECT
        SUM(CASE
            WHEN co.pizza_id = 1 THEN 12
            WHEN co.pizza_id = 2 THEN 10
            ELSE 0 END
            ) AS total_earnings
    FROM
        customer_orders_clean co
        INNER JOIN runner_orders_clean ro
            ON co.order_id = ro.order_id
    WHERE
        ro.distance <> 0
""")

Unnamed: 0,total_earnings
0,138


Q24: Refer to Q23, what if there was an additional $1 charge for any pizza extras? Example: Add cheese is $1 extra

In [36]:
query("""
    WITH RECURSIVE split(order_id, extras, str) AS (
        SELECT order_id, '', extras||',' FROM customer_orders_clean
        UNION ALL SELECT
        order_id,
        substr(str, 0, instr(str, ',')),
        substr(str, instr(str, ',')+1)
        FROM split WHERE str <> ''
    ),
      
    pizza_earnings AS( 
        SELECT
            SUM(CASE
                WHEN co.pizza_id = 1 THEN 12
                WHEN co.pizza_id = 2 THEN 10
                ELSE 0 END
                ) AS total_pizza_earnings
        FROM
            customer_orders_clean co
            INNER JOIN runner_orders_clean ro
                ON co.order_id = ro.order_id
        WHERE
            ro.distance <> 0
    ),
      
    topping_earnings AS(
        SELECT
            COUNT(*) as total_topping_earnings
        FROM
            split
             INNER JOIN runner_orders_clean ro
                ON split.order_id = ro.order_id
        WHERE
            ro.distance <> 0
            AND split.extras <> ''
    )
      
    SELECT
        pe.total_pizza_earnings + te.total_topping_earnings 
            AS total_earnings_with_extras
    FROM
        pizza_earnings pe, topping_earnings te
""")

Unnamed: 0,total_earnings_with_extras
0,142


Note: 
- Orders with unsuccessful deliveries are not included. This includes Order #9 with 2 extras. Hence, only $4 was earned in total from the extras.

Q25: The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset - generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.


In [38]:
cursor.executescript("""
    DROP TABLE IF EXISTS ratings;
    CREATE TEMP TABLE ratings(
        rating_id INTEGER NOT NULL,
        order_id INTEGER NOT NULL,
        rating INTEGER NOT NULL,
        rating_time TIMESTAMP NOT NULL
    );
    INSERT INTO ratings (rating_id, order_id, rating, rating_time)
    VALUES
        (1,1,5,'2021-02-01 10:00:00'),
        (2,4,2,'2021-02-02 10:00:00'),
        (3,5,3,'2021-02-03 10:00:00'),
        (4,7,2,'2021-02-04 10:00:00'),
        (5,10,4,'2021-02-05 10:00:00')
""")

<sqlite3.Cursor at 0x2265db202c0>

In [39]:
query("SELECT * FROM ratings")

Unnamed: 0,rating_id,order_id,rating,rating_time
0,1,1,5,2021-02-01 10:00:00
1,2,4,2,2021-02-02 10:00:00
2,3,5,3,2021-02-03 10:00:00
3,4,7,2,2021-02-04 10:00:00
4,5,10,4,2021-02-05 10:00:00


Q26: Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?
- `customer_id`, `order_id`, `runner_id`, `rating`, `order_time`, `pickup_time`
- Time between order and pickup, Delivery duration, Average speed, Total number of pizzas

In [41]:
query("""
    SELECT
        co.customer_id,
        co.order_id,
        ro.runner_id,
        r.rating,
        co.order_time,
        ro.pickup_time
    FROM
        customer_orders_clean co
        INNER JOIN runner_orders_clean ro
            ON co.order_id = ro.order_id
        LEFT JOIN ratings r
            ON co.order_id = r.order_id
    WHERE
        ro.distance <> 0
    GROUP BY
        co.order_id
""")

Unnamed: 0,customer_id,order_id,runner_id,rating,order_time,pickup_time
0,101,1,1,5.0,2020-01-01 18:05:02,2020-01-01 18:15:34
1,101,2,1,,2020-01-01 19:00:52,2020-01-01 19:10:54
2,102,3,1,,2020-01-02 23:51:23,2020-01-03 00:12:37
3,103,4,2,2.0,2020-01-04 13:23:46,2020-01-04 13:53:03
4,104,5,3,3.0,2020-01-08 21:00:29,2020-01-08 21:10:57
5,105,7,2,2.0,2020-01-08 21:20:29,2020-01-08 21:30:45
6,102,8,2,,2020-01-09 23:54:33,2020-01-10 00:15:02
7,104,10,1,4.0,2020-01-11 18:34:49,2020-01-11 18:50:20


Q27: If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?


In [48]:
# Get query from #23 then subtract by sum of all distances * 0.3 
# This way we eliminate the need for joins and increase computation efficiency.
query("""
    WITH earnings AS (
        SELECT
            SUM(CASE
                WHEN co.pizza_id = 1 THEN 12
                WHEN co.pizza_id = 2 THEN 10
                ELSE 0 END
                ) AS total_earnings
        FROM
            customer_orders_clean co
            INNER JOIN runner_orders_clean ro
                ON co.order_id = ro.order_id
        WHERE
            ro.distance <> 0
    ),
      
    expenses AS (
        SELECT
            SUM(distance) * 0.3 as total_expenses
        FROM
            runner_orders_clean
        WHERE
            distance <> 0
    )
      
    SELECT
        CAST(ea.total_earnings AS FLOAT) AS total_earnings,
        ex.total_expenses,
        ea.total_earnings - ex.total_expenses AS net_income
    FROM
        earnings ea, expenses ex
""")

Unnamed: 0,total_earnings,total_expenses,net_income
0,138.0,43.56,94.44


**E. Other**

Q28: If Danny wants to expand his range of pizzas - how would this impact the existing data design? Write an INSERT statement to demonstrate what would happen if a new Supreme pizza with all the toppings was added to the Pizza Runner menu?



In [53]:
cursor.executescript("""
    DROP TABLE IF EXISTS pizza_names_temp;
    CREATE TEMP TABLE pizza_names_temp AS
        SELECT * FROM pizza_names;
    DROP TABLE IF EXISTS pizza_recipes_temp;
    CREATE TEMP TABLE pizza_recipes_temp AS
        SELECT * FROM pizza_recipes;
    INSERT INTO pizza_names_temp (pizza_id, pizza_name)
        VALUES (3, 'Supreme');
    INSERT INTO pizza_recipes_temp (pizza_id, toppings)
        VALUES (3, '1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12')
""")

<sqlite3.Cursor at 0x2265db202c0>

In [54]:
query("SELECT * FROM pizza_names_temp")

Unnamed: 0,pizza_id,pizza_name
0,1,Meatlovers
1,2,Vegetarian
2,3,Supreme


In [55]:
query("SELECT * FROM pizza_recipes_temp")

Unnamed: 0,pizza_id,toppings
0,1,"1, 2, 3, 4, 5, 6, 8, 10"
1,2,"4, 6, 7, 9, 11, 12"
2,3,"1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12"
