In [1]:
## Import Packages
import pandas as pd
import mysql.connector
import os
from sqlalchemy import create_engine

# Use sqlalchemy to connect to my MySQL Database
engine = create_engine('mysql+mysqlconnector://root:****@localhost/paradise_tables')
orders = pd.read_sql('SELECT * FROM orders', engine)
orders.head()

ModuleNotFoundError: No module named 'mysql'

## Question 1: Total Revenue
Let's start with a simple question: How much revenue has Paradise Tables generated in total ever since I started it as a side business last summer?

In [11]:
query_1 = '''
            SELECT SUM(Revenue) AS total_rev
            FROM `paradise_tables`.`orders`
'''
pd.read_sql(query_1, engine)


Unnamed: 0,total_rev
0,9324.799992


## Question 2: Total Orders
Now that I know how much revenue I've earned, I want to know how many orders I've received from customers. I have to filter my dataset to orders where I made money (Revenue > 0) because there are orders I've done for free for birthdays, marketing events, etc.

In [12]:
query_2 = '''
            SELECT COUNT(*) AS total_orders, SUM(Revenue) / COUNT(*) AS avg_rev_per_order
            FROM `paradise_tables`.`orders`
            WHERE Revenue > 0
'''
pd.read_sql(query_2, engine)


Unnamed: 0,total_orders
0,43


## Question 3: Revenue By Date
The next question I'd like to answer is how my revenue and orders are broken down per month, and I can visualize the data in a graph to see if there are any trends in the data.

In [3]:
query_3 = '''
            SELECT date_format(date, "%Y-%m") AS date,
                SUM(`Revenue`) AS total_rev,
                COUNT(DISTINCT order_id) as total_orders
            FROM `paradise_tables`.`orders` 
            GROUP BY MONTH(`Date`)
'''
by_month = pd.read_sql(query_3,engine)
by_month

NameError: name 'engine' is not defined

In [4]:
plt.plot(by_month.total_rev)
plt.plot(by_month.total_orders)

NameError: name 'plt' is not defined

## Question 4
We can see that our highest revenue generating month was December, most likely due to holiday season with Christmas and the New Years. Let's pull data on the orders for this month and what people ordered.

In [5]:
query_4 = '''
            SELECT *
            FROM `paradise_tables`.`orders` AS o
            WHERE Revenue > 0 and date_format(date, "%Y-%m") = '2022-12'
            JOIN `paradise_tables`.`order_details` AS od
            ON od.OrderID = o.OrderID
'''
pd.read_sql(query_4,engine)

NameError: name 'engine' is not defined

## Question 5
Let's move on to analyze our order details table and first look at our most popular items in terms of quantity sold. We have to join on our 'Orders' table to filter Revenue > 0 to only consider real orders to customers. We see that our top 5 items are the Assorted Fruit Pastries are our most popular item followed by different cheesecake cups.

In [6]:
query_5 = '''SELECT SUM(Quantity) AS tot_quant, 
                ItemID, 
                ItemDescription 
            FROM `paradise_tables`.`order_details` AS od 
            JOIN `paradise_tables`.`orders` AS o 
            ON od.OrderID = o.OrderID 
            WHERE Revenue > 0 
            GROUP BY ItemID 
            ORDER BY tot_quant DESC'''
pd.read_sql(query_5 ,engine)

NameError: name 'engine' is not defined

## Question 6
I want to answer how the question of what are the summary statistics of how much quantity of items a cusstomer orders on average.

In [7]:
query_6 = '''
        SELECT AVG(TotalQuantity) AS avg_quantity,
            STDEV(TotalQuantity) as std_quantity,
            MAX(TotalQuantity) as max_quantity,
            MIN(TotalQuantity) as min_quantity
            FROM (
                SELECT SUM(Quantity) AS TotalQuantity, 
                    od.OrderID
                FROM `paradise_tables`.`order_details` AS od
                JOIN `paradise_tables`.`orders` AS o
                ON o.OrderID = od.OrderID
                WHERE Revenue > 0
                GROUP BY OrderID) AS a
'''
pd.read_sql(query_6, engine)

NameError: name 'engine' is not defined

## Question 7
Now let's dive into the actual profitability of our business. We join our Orders table with our Ingriedients, Supplies Used, Overhead, and labor table to grab all of our total revenue and cost per order. We calculate our supply cost by dividing the total number of supplies used divided by the average expense I've spent to purchase those supplies, the total ingredient cost per order, and the labor cost if there were any for paying anybody who assisted me in making the orders. We have a final profit column which subtracts all of the costs I've just mentioned from the total revenue for that order. This does not 100% represent true profits per order as there are some supplies I don't take into account when entering data such as stickers used and this does not take into account all of the start up costs I've spent. However, this helps me know how profitable my business is after I've officially launched and how much profit I've realized per order.

In [8]:
query_7 = '''
            SELECT o.Date, 
                    o.OrderID, 
                    o.Revenue, 
                    sub.supply_cost, 
                    ingredients.ingredient_cost, 
                    labor.compensation,
                    sub.supply_cost + ingredients.ingredient_cost AS total_cost,
                    o.Revenue - (sub.supply_cost + ingredients.ingredient_cost) AS profit
                FROM `paradise_tables`.`orders` AS o
                JOIN (
                    SELECT SUM(Quantity * avg_supply_price) AS supply_cost, 
                            s.OrderID,
                            s.SupplyID, 
                            s.SupplyDescription
                        FROM `paradise_tables`.`supplies_used` AS s
                        LEFT JOIN(SELECT AVG(Cost/Quantity) AS avg_supply_price, SupplyID, SupplyDescription 
                        FROM `paradise_tables`.`overhead` 
                        GROUP BY SupplyID) AS a
                        ON s.SupplyID = a.SupplyID  
                        GROUP BY s.OrderID) AS sub
                ON o.OrderID = sub.OrderID
                JOIN (
                    SELECT SUM(Cost) AS ingredient_cost, 
                        OrderID
                    FROM `paradise_tables`.`ingredients` AS i
                    GROUP BY i.OrderID) AS ingredients
                ON o.OrderID = ingredients.OrderID 
                LEFT JOIN (
                    SELECT SUM(Compensation) AS compensation, 
                    l.OrderID
                    FROM `paradise_tables`.`labor` AS l
                    GROUP BY l.OrderID
                ) AS labor
                ON o.OrderID = labor.OrderID'''
pd.read_sql(query_7,engine)

NameError: name 'engine' is not defined

## Question 8
I've created a table to track the hours I spend per order doing big ticket items such as buying ingredients, making desserts, delivery and setting up. These are estimates as I sometimes forget to track how much time I've actually spent doing each task. We see that the majority of my time is spent making desserts, which makes sense as that is the process that takes the longest. 

In [9]:
query_8 = '''
            SELECT SUM(Hours) AS hours, 
                LaborID, 
                LaborDescription
            FROM `paradise_tables`.`personal_labor`
            GROUP BY LaborID   
'''
pd.read_sql(query_8,engine)

NameError: name 'engine' is not defined

## Question 9
Since the largest expense is buying ingredients for my orders, I need to know how much I'm spending on ingredients and what my largest ingredient expenses are in terms of quantity and cost. I create a table that only takes into account of ingredient items that I've had to purchase for more than 5 orders. I look at the average cost, total quantity and how many orders that are associated with a purchase of that ingredient. We see that the most expensive and frequently bought items are Heavy Cream and Cream Cheese, indicating that I mmay want to buy in bulk or find deals to lower my costs.

In [10]:
query_9 = '''
            SELECT SUM(Cost) as total_cost,
                SUM(Quantity) as qty,
                SUM(Cost) / SUM(Quantity) as avg_cost
                COUNT(*) as cnt, 
                IngredientDescription, 
                Location
            FROM `paradise_tables`.`ingredients` 
            WHERE Location IS NOT NULL
            GROUP BY Location, IngredientID
            HAVING count(*) > 5
            ORDER BY cnt DESC, qty ASC
'''
pd.read_sql(query_9,engine)

NameError: name 'engine' is not defined

## Question 10
I'd like to view my highest-paying customers and calculate columns such as: total revenue, orders, number of items and quantity that they've purchased. I only consider customers who have spent more than $300 and who have purchased more than 5 distinct items. I sort by number of orders descending and then total revenue descending.

In [11]:
query_10 ='''
            WITH temp_table AS (
                SELECT SUM(Revenue) AS total_rev,
                    CustomerName
                FROM `paradise_tables`.`orders`
                GROUP BY CustomerName
                HAVING total_rev > 300)
            SELECT t.CustomerName, t.total_rev, sub.num_orders, sub.num_items, sub.total_qty
            FROM temp_table as t
            JOIN (
                SELECT CustomerName, 
                    COUNT(DISTINCT o.OrderID) AS num_orders, 
                    COUNT(DISTINCT od.ItemID) AS num_items, 
                    SUM(Quantity) AS total_qty
                FROM `paradise_tables`.`orders` AS o 
                JOIN `paradise_tables`.`order_details` AS od
                ON o.OrderID = od.OrderID
                GROUP BY CustomerName
                HAVING count(DISTINCT od.ItemID) > 5 ) AS sub
            ON t.CustomerName = sub.CustomerName
            ORDER BY sub.num_orders DESC, total_rev DESC


'''
pd.read_sql(query_10,engine)


Unnamed: 0,CustomerName,total_rev,num_orders,num_items,total_qty
0,Shelley Gao,613.700001,3,9,237.0
1,Shereen Aclan,418.199997,2,9,132.0
2,Ella Diep,685.0,1,7,182.0
3,Michelle Alvarenga,596.200012,1,9,216.0
4,Shannen Vong,545.0,1,6,101.0
5,Sandy Cakes,430.0,1,6,101.0
6,Andrea Wong,377.0,1,6,101.0
7,Denice,360.0,1,6,101.0
8,Tam Tran,316.0,1,6,73.0
