### Module 5: SQL for Data Analysis

This module introduces SQL concepts commonly used by data analysts to query structured data. SQL queries are used to answer the same business questions explored with Pandas, reinforcing analytical thinking across tools.

The focus is on translating business requirements into efficient database queries and understanding how SQL complements Python-based analysis in real-world data workflows.

In [1]:
import pandas as pd
import sqlite3

In [2]:
df = pd.read_csv("../data/retail_sales_cleaned.csv")
df.head()

Unnamed: 0,order_id,order_date,region,product,category,quantity,unit_price
0,1001,2023-01-05,North,Laptop,Electronics,2.0,750.0
1,1002,2023-01-07,South,Mobile,Electronics,8.0,300.0
2,1003,2023-01-10,East,Chair,Furniture,10.0,45.0
3,1004,2023-01-15,West,Table,Furniture,8.0,120.0
4,1005,2023-01-20,Unknown,Headphones,Electronics,8.0,60.0


In [3]:
conn = sqlite3.connect("../data/retail_sales.db")

In [5]:
df.to_sql("sales", conn, if_exists = "replace", index = False)

6

In [7]:
query = """ SELECT * FROM sales LIMIT 5; """
pd.read_sql(query, conn)

Unnamed: 0,order_id,order_date,region,product,category,quantity,unit_price
0,1001,2023-01-05,North,Laptop,Electronics,2.0,750.0
1,1002,2023-01-07,South,Mobile,Electronics,8.0,300.0
2,1003,2023-01-10,East,Chair,Furniture,10.0,45.0
3,1004,2023-01-15,West,Table,Furniture,8.0,120.0
4,1005,2023-01-20,Unknown,Headphones,Electronics,8.0,60.0


In [9]:
df["revenue"] = df["quantity"]*df["unit_price"]

In [12]:
df.head(3)

Unnamed: 0,order_id,order_date,region,product,category,quantity,unit_price,revenue
0,1001,2023-01-05,North,Laptop,Electronics,2.0,750.0,1500.0
1,1002,2023-01-07,South,Mobile,Electronics,8.0,300.0,2400.0
2,1003,2023-01-10,East,Chair,Furniture,10.0,45.0,450.0


In [14]:
df.to_sql("sales", conn, if_exists = "replace", index = False)

6

In [15]:
query = """ SELECT product, revenue FROM sales LIMIT 5; """
pd.read_sql(query, conn)

Unnamed: 0,product,revenue
0,Laptop,1500.0
1,Mobile,2400.0
2,Chair,450.0
3,Table,960.0
4,Headphones,480.0


In [16]:
query = """ SELECT * FROM sales WHERE category = "Electronics"; """
pd.read_sql(query, conn)

Unnamed: 0,order_id,order_date,region,product,category,quantity,unit_price,revenue
0,1001,2023-01-05,North,Laptop,Electronics,2.0,750.0,1500.0
1,1002,2023-01-07,South,Mobile,Electronics,8.0,300.0,2400.0
2,1005,2023-01-20,Unknown,Headphones,Electronics,8.0,60.0,480.0


In [18]:
query = """ SELECT category, SUM("revenue") AS total_revenue FROM sales GROUP BY category; """
pd.read_sql(query, conn)

Unnamed: 0,category,total_revenue
0,Electronics,4380.0
1,Furniture,1500.0


In [20]:
query = """
SELECT category, SUM(revenue) AS total_revenue
FROM sales
GROUP BY category
ORDER BY total_revenue;
"""

pd.read_sql(query, conn)

Unnamed: 0,category,total_revenue
0,Furniture,1500.0
1,Electronics,4380.0


## SQL ↔ Pandas Mapping

- SELECT → column selection
- WHERE → filtering rows
- GROUP BY → aggregation
- ORDER BY → sorting
- LIMIT → head()