# Supermarket Sales Analysis
Set notebook to function to read SQL commmands.

In [52]:
import pandas as pd
import sqlite3

# Import data
df = pd.read_csv('supermarket_sales.csv')

# Fixing column names
df.rename(columns={'Total': 'Total_Revenue', 'Tax_5%': 'Taxes'}, inplace=True)
df.columns = df.columns.str.replace(' ', '_', regex=True)

# Confirm date column in datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Connecting to SQLite3 database
conn = sqlite3.connect('supermarket_sales.db')

# Transform csv to db
df.to_sql('sales', conn, index=False, if_exists='replace')

# Activate SQL extension
%load_ext sql

# Connect SQL Magic (to use SQl commands) to our databse
%sql sqlite:///supermarket_sales.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [3]:
%%sql
SELECT * FROM sales
LIMIT 3

 * sqlite:///supermarket_sales.db
Done.


Invoice_ID,Branch,City,Customer_type,Gender,Product_line,Unit_price,Quantity,Tax_5%,Total_Revenue,Date,Time,Payment,cogs,gross_margin_percentage,gross_income,Rating
750-67-8428,A,Yangon,Member,Female,Health and beauty,74.69,7,26.1415,548.9715,1/5/2019,13:08,Ewallet,522.83,4.761904762,26.1415,9.1
226-31-3081,C,Naypyitaw,Normal,Female,Electronic accessories,15.28,5,3.82,80.22,3/8/2019,10:29,Cash,76.4,4.761904762,3.82,9.6
631-41-3108,A,Yangon,Normal,Male,Home and lifestyle,46.33,7,16.2155,340.5255,3/3/2019,13:23,Credit card,324.31,4.761904762,16.2155,7.4


## Business Questions

### 1. General Sales Analysis
- What is the total revenue generated by the supermarket across all branches?
- Which branch generates the highest revenue?
- How many transactions were made in each branch?
- What are the top 3 products by revenue in each branch?

SQL Skills:
- Basic `SELECT`, `SUM`, `GROUP BY`, and `ORDER BY`.

In [6]:
%%sql
-- What is the total revenue generated by the supermarket across all branches?
SELECT SUM(Total_Revenue)
FROM sales

 * sqlite:///supermarket_sales.db
Done.


SUM(Total_Revenue)
322966.749


In [7]:
%%sql
-- Which branch generates the highest revenue?
SELECT Branch, SUM(Total_Revenue)
FROM sales
GROUP BY Branch
ORDER BY Total_Revenue DESC

 * sqlite:///supermarket_sales.db
Done.


Branch,SUM(Total_Revenue)
A,106200.3705
B,106197.672
C,110568.7065


In [8]:
%%sql
-- How many transactions were made in each branch?
SELECT Branch, COUNT(Invoice_ID)
FROM sales
GROUP BY Branch
ORDER BY Invoice_ID DESC

 * sqlite:///supermarket_sales.db
Done.


Branch,COUNT(Invoice_ID)
A,340
B,332
C,328


In [74]:
%%sql
-- What are the top 3 products by revenue in each branch?
SELECT Branch, Product_line, total_rev
FROM(
    SELECT Branch, Product_Line, ROUND(SUM(Total_Revenue),2) as total_rev,
        RANK() OVER(
            PARTITION BY Branch
            ORDER BY SUM(Total_Revenue) DESC
        )AS rank_rev
    FROM sales
    GROUP BY Branch, Product_line) ranking_revenue
WHERE rank_rev BETWEEN 1 AND 3

 * sqlite:///supermarket_sales.db
Done.


Branch,Product_Line,total_rev
A,Home and lifestyle,22417.2
A,Sports and travel,19372.7
A,Electronic accessories,18317.11
B,Sports and travel,19988.2
B,Health and beauty,19980.66
B,Home and lifestyle,17549.16
C,Food and beverages,23766.85
C,Fashion accessories,21560.07
C,Electronic accessories,18968.97


### 2. Customer Behavior
- What is the average spending per customer?
- Which customer group (e.g., loyalty vs. non-loyalty) spends the most on average?
- How many customers made multiple purchases?

SQL Skills:
- Use of aggregate functions like `AVG` and `COUNT`.
- Mathematical operations in queries.

In [9]:
%%sql
-- What is the average spending per customer?
SELECT AVG(Total_revenue) FROM sales

 * sqlite:///supermarket_sales.db
Done.


AVG(Total_revenue)
322.966749


In [12]:
%%sql
-- Which customer group (e.g., loyalty vs. non-loyalty) spends the most on average?
SELECT Customer_type, AVG(Total_Revenue) as avg_revenue
FROM sales
GROUP BY Customer_type
ORDER BY avg_revenue DESC

 * sqlite:///supermarket_sales.db
Done.


Customer_type,avg_revenue
Member,327.7913053892215
Normal,318.12285571142286


In [14]:
%%sql
-- How many loyal customers in each branch?
SELECT Branch, Customer_type, AVG(Total_revenue) as avg_revenue
FROM sales
GROUP BY Branch, Customer_type
ORDER BY avg_revenue DESC

 * sqlite:///supermarket_sales.db
Done.


Branch,Customer_type,avg_revenue
C,Normal,337.65675471698114
C,Member,336.57563609467456
B,Member,325.4829454545455
A,Member,321.1824880239521
B,Normal,314.3292574850299
A,Normal,303.8317630057803



### 3. Product Performance
- Which product category generates the most revenue?
- What is the best-selling product in each branch?
- How does the revenue of each product category compare between branches?

SQL Skills:
- Aggregations with `SUM` and `GROUP BY`.
- `WHERE` for conditioning.
- `ORDER BY`and `RANK` for ranking.

In [15]:
%%sql
-- Which product category generates the most revenue?
SELECT Product_line, SUM(Total_revenue) as total_rev
FROM sales
GROUP BY Product_line
ORDER BY total_rev DESC

 * sqlite:///supermarket_sales.db
Done.


Product_line,total_rev
Food and beverages,56144.844
Sports and travel,55122.8265
Electronic accessories,54337.5315
Fashion accessories,54305.895
Home and lifestyle,53861.913
Health and beauty,49193.739


In [46]:
%%sql
-- What is the best-selling product in each branch?
SELECT Branch, Product_line, total_sold
FROM (
    SELECT Branch, Product_line, SUM(Quantity) AS total_sold,
           RANK() OVER (
               PARTITION BY Branch
               ORDER BY SUM(Quantity) DESC
           ) AS prod_rank
    FROM sales
    GROUP BY Branch, Product_line
) ranking_subquery
WHERE prod_rank = 1


 * sqlite:///supermarket_sales.db
Done.


Branch,Product_line,total_sold
A,Home and lifestyle,371
B,Sports and travel,322
C,Food and beverages,369


In [49]:
%%sql
-- How does the revenue of each product category compare between branches?
SELECT Branch, Product_line, SUM(Total_Revenue) as total_rev
FROM sales
GROUP BY Branch, Product_line
ORDER BY Branch, Product_line

 * sqlite:///supermarket_sales.db
Done.


Branch,Product_line,total_rev
A,Electronic accessories,18317.1135
A,Fashion accessories,16332.5085
A,Food and beverages,17163.1005
A,Health and beauty,12597.753
A,Home and lifestyle,22417.1955
A,Sports and travel,19372.6995
B,Electronic accessories,17051.4435
B,Fashion accessories,16413.3165
B,Food and beverages,15214.8885
B,Health and beauty,19980.66


### 4. Revenue Trends
- What is the revenue trend over time (e.g., by day, week, or month)?
- Which day of the week has the highest sales?
- Are there seasonal variations in revenue?

SQL Skills:
- Date manipulation and grouping (`DATE`, `EXTRACT`, `GROUP BY`).
- Conditions with `CASE`.

In [None]:
%%sql
-- What is the revenue trend over time (e.g., by day, week, or month)?
SELECT 
    strftime('%Y', Date) AS Year, 
    strftime('%m', Date) AS Month, 
    SUM(Total_Revenue) AS Monthly_Revenue
FROM sales
GROUP BY Year, Month
ORDER BY Year, Month

-- Since this is a SQLite version, I need to use strftime to work with dates.
-- If this was a MySQL database, I could use:
-- EXTRACT(YEAR FROM Date)
-- EXTRACT(MONTH FROM Date)

 * sqlite:///supermarket_sales.db
Done.


Year,Month,Monthly_Revenue
2019,1,116291.868
2019,2,97219.374
2019,3,109455.507


In [42]:
%%sql
-- Which day of the week has the highest sales?
SELECT 
    strftime('%w', Date) AS Day,   
    SUM(Total_Revenue) AS total_rev,
    SUM(Quantity) AS total_sold,
    CASE 
        WHEN strftime('%w', Date) = '0' THEN 'Sunday'
        WHEN strftime('%w', Date) = '1' THEN 'Monday'
        WHEN strftime('%w', Date) = '2' THEN 'Tuesday'
        WHEN strftime('%w', Date) = '3' THEN 'Wednesday'
        WHEN strftime('%w', Date) = '4' THEN 'Thursday'
        WHEN strftime('%w', Date) = '5' THEN 'Friday'
        WHEN strftime('%w', Date) = '6' THEN 'Saturday'
    END AS DayName
FROM sales
GROUP BY DayName
ORDER BY Day ASC


 * sqlite:///supermarket_sales.db
Done.


Day,total_rev,total_sold,DayName
0,44457.8925,778,Sunday
1,37899.078,638,Monday
2,51482.2455,862,Tuesday
3,43731.135,800,Wednesday
4,45349.248,755,Thursday
5,43926.3405,758,Friday
6,56120.8095,919,Saturday


### 5. Operational Insights
- What is the Cost of Goods (CoG) per product?
- What is the network net profit?
- What is the distribution of payment methods (e.g., cash, credit card)?

SQL Skills:
- Filtering with `WHERE` and conditional calculations.
- Aggregations and `CASE` for categorizing payment methods.

In [50]:
%%sql
-- What is the Cost of Goods (CoG) per product?
SELECT Branch, Product_line, SUM(cogs) as total_cogs, ROUND(AVG(cogs),2) as avg_cogs
FROM sales
GROUP BY Branch, Product_line
ORDER BY Branch 


 * sqlite:///supermarket_sales.db
Done.


Branch,Product_line,total_cogs,avg_cogs
A,Electronic accessories,17444.87,290.75
A,Fashion accessories,15554.77,305.0
A,Food and beverages,16345.81,281.82
A,Health and beauty,11997.86,255.27
A,Home and lifestyle,21349.71,328.46
A,Sports and travel,18450.19,312.72
B,Electronic accessories,16239.47,295.26
B,Fashion accessories,15631.73,252.12
B,Food and beverages,14490.37,289.81
B,Health and beauty,19029.2,359.04


In [59]:
%%sql
-- What is the network net profit?
SELECT 
    Branch,
    Product_line,
    Quantity,
    ROUND(SUM(Total_Revenue) - SUM(cogs)
    ,2) as Net_profit
FROM sales
GROUP BY Branch, Product_line

 * sqlite:///supermarket_sales.db
Done.


Branch,Product_line,Quantity,Net_profit
A,Electronic accessories,6,872.24
A,Fashion accessories,2,777.74
A,Food and beverages,10,817.29
A,Health and beauty,7,599.89
A,Home and lifestyle,7,1067.49
A,Sports and travel,7,922.51
B,Electronic accessories,4,811.97
B,Fashion accessories,4,781.59
B,Food and beverages,3,724.52
B,Health and beauty,3,951.46


In [62]:
%%sql
-- What is the distribution of payment methods (e.g., cash, credit card)?
SELECT Branch, Payment, ROUND(SUM(Total_Revenue), 2) as total_rev
FROM sales
GROUP BY Branch, Payment
ORDER BY Branch, total_rev DESC

 * sqlite:///supermarket_sales.db
Done.


Branch,Payment,total_rev
A,Ewallet,39324.37
A,Cash,33781.25
A,Credit card,33094.75
B,Credit card,37344.86
B,Cash,35339.46
B,Ewallet,33513.35
C,Cash,43085.86
C,Ewallet,37155.38
C,Credit card,30327.47
