# SQL Exercise Questions - 02 SQL Concepts

## Introduction

This notebook contains exercise questions covering all SQL concepts taught in the '02 SQL' folder. These exercises are designed to test your understanding of:

- **SQL Joins** (INNER, LEFT, RIGHT, FULL OUTER JOIN, UNION)
- **Handling NULLs** (COALESCE, NVL, NULL handling strategies)
- **GROUP BY and Aggregations** (COUNT, SUM, AVG, MIN, MAX, HAVING)
- **Window Functions** (ROW_NUMBER, RANK, DENSE_RANK, PARTITION BY, ORDER BY, window frames)
- **Common Table Expressions (CTEs)** (WITH clause, finding/deleting duplicates, recursive queries)
- **Troubleshooting** (Debugging common SQL issues)

**Database:** All solutions should be written in **Snowflake SQL**. Notes for SQL Server are provided where syntax differs.

**Dataset:** All exercises use the following tables from `CETPA_DB.PUBLIC`:
- `CUSTOMERS` - Customer information
- `geolocation` - Geographic location data
- `orders` - Order information
- `order_items` - Items in each order
- `order_payments` - Payment information for orders
- `order_reviews` - Customer reviews for orders
- `products` - Product catalog
- `product_category_name_translation` - Product category translations
- `sellers` - Seller information

**Difficulty Levels:**
- ðŸŸ¢ **Beginner** - Basic concepts, single table or simple joins
- ðŸŸ¡ **Intermediate** - Multiple joins, aggregations, basic window functions
- ðŸ”´ **Advanced** - Complex CTEs, advanced window functions, multi-step logic

---

## Table Schema Reference

```sql
-- CUSTOMERS
CUSTOMER_ID (VARCHAR)
CUSTOMER_UNIQUE_ID (VARCHAR)
CUSTOMER_ZIP_CODE_PREFIX (NUMBER)
CUSTOMER_CITY (VARCHAR)
CUSTOMER_STATE (VARCHAR)

-- geolocation
GEOLOCATION_ZIP_CODE_PREFIX (NUMBER)
GEOLOCATION_LAT (NUMBER)
GEOLOCATION_LNG (NUMBER)
GEOLOCATION_CITY (VARCHAR)
GEOLOCATION_STATE (VARCHAR)

-- orders
ORDER_ID (VARCHAR)
CUSTOMER_ID (VARCHAR)
ORDER_STATUS (VARCHAR)
ORDER_PURCHASE_TIMESTAMP (TIMESTAMP_NTZ)
ORDER_APPROVED_AT (TIMESTAMP_NTZ)
ORDER_DELIVERED_CARRIER_DATE (TIMESTAMP_NTZ)
ORDER_DELIVERED_CUSTOMER_DATE (TIMESTAMP_NTZ)
ORDER_ESTIMATED_DELIVERY_DATE (TIMESTAMP_NTZ)

-- order_items
ORDER_ID (VARCHAR)
ORDER_ITEM_ID (NUMBER)
PRODUCT_ID (VARCHAR)
SELLER_ID (VARCHAR)
SHIPPING_LIMIT_DATE (TIMESTAMP_NTZ)
PRICE (NUMBER(38,2))
FREIGHT_VALUE (NUMBER(38,2))

-- order_payments
ORDER_ID (VARCHAR)
PAYMENT_SEQUENTIAL (NUMBER)
PAYMENT_TYPE (VARCHAR)
PAYMENT_INSTALLMENTS (NUMBER)
PAYMENT_VALUE (NUMBER(38,2))

-- order_reviews
REVIEW_ID (VARCHAR)
ORDER_ID (VARCHAR)
REVIEW_SCORE (NUMBER)
REVIEW_COMMENT_TITLE (VARCHAR)
REVIEW_COMMENT_MESSAGE (VARCHAR)
REVIEW_CREATION_DATE (TIMESTAMP_NTZ)
REVIEW_ANSWER_TIMESTAMP (TIMESTAMP_NTZ)

-- products
PRODUCT_ID (VARCHAR)
PRODUCT_CATEGORY_NAME (VARCHAR)
PRODUCT_NAME_LENGHT (NUMBER)
PRODUCT_DESCRIPTION_LENGHT (NUMBER)
PRODUCT_PHOTOS_QTY (NUMBER)
PRODUCT_WEIGHT_G (NUMBER)
PRODUCT_LENGTH_CM (NUMBER)
PRODUCT_HEIGHT_CM (NUMBER)
PRODUCT_WIDTH_CM (NUMBER)

-- product_category_name_translation
C1 (VARCHAR)  -- Portuguese category name
C2 (VARCHAR)  -- English category name

-- sellers
SELLER_ID (VARCHAR)
SELLER_ZIP_CODE_PREFIX (NUMBER)
SELLER_CITY (VARCHAR)
SELLER_STATE (VARCHAR)
```

---

## Section 1: Basic Joins (ðŸŸ¢ Beginner)

### Question 1.1: Customer Orders
**Difficulty:** ðŸŸ¢ Beginner

**Problem:** Write a query to show all customers with their order IDs and order dates. Include customers who have never placed an order (show NULL for order information).

**Expected Output:**
- Customer ID
- Customer Unique ID
- Order ID (NULL if no orders)
- Order Purchase Timestamp (NULL if no orders)

**Hint:** Think about which join type preserves all customers.

---

### Question 1.2: Order Details with Products
**Difficulty:** ðŸŸ¢ Beginner

**Problem:** Show all order items with their product information. Display:
- Order ID
- Order Item ID
- Product ID
- Product Category Name
- Price
- Freight Value

**Expected Output:** One row per order item with product details.

**Hint:** You'll need to join `order_items` with `products`.

---

### Question 1.3: Orders Without Items
**Difficulty:** ðŸŸ¢ Beginner

**Problem:** Find all orders that have no order items. Show:
- Order ID
- Order Status
- Order Purchase Timestamp

**Hint:** Use a LEFT JOIN and filter for NULLs.

---

## Section 2: Multiple Joins (ðŸŸ¡ Intermediate)

### Question 2.1: Complete Order Information
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Create a comprehensive order report showing:
- Order ID
- Customer City and State
- Order Status
- Order Purchase Timestamp
- Product Category Name (English translation if available)
- Price per item
- Payment Type
- Payment Value

**Expected Output:** One row per order item with all related information.

**Hint:** You'll need multiple joins: orders â†’ customers, orders â†’ order_items â†’ products â†’ category translation, orders â†’ order_payments.

---

### Question 2.2: Customer Order Summary
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Show all customers with their order statistics:
- Customer ID
- Customer Unique ID
- Customer City
- Customer State
- Total number of orders
- Total amount spent (sum of payment values)
- Average order value

Include customers who have never placed an order (show 0 for their metrics).

**Hint:** Use LEFT JOINs and aggregate functions. Remember to handle NULLs.

---

### Question 2.3: Seller Performance
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Show seller performance metrics:
- Seller ID
- Seller City
- Seller State
- Number of unique orders
- Total revenue (sum of price + freight_value)
- Average order value
- Number of unique products sold

**Expected Output:** One row per seller with aggregated metrics.

**Hint:** Join `sellers` with `order_items`, then aggregate.

---

## Section 3: Handling NULLs (ðŸŸ¡ Intermediate)

### Question 3.1: Safe Revenue Calculation
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Calculate total revenue per order, handling NULLs properly. Revenue should be calculated as:
- Sum of (price + freight_value) for all items in an order
- If any value is NULL, treat it as 0

Show:
- Order ID
- Order Status
- Calculated Revenue (never NULL)

**Hint:** Use COALESCE to handle NULLs in calculations.

---

### Question 3.2: Customer Display Names
**Difficulty:** ðŸŸ¢ Beginner

**Problem:** Create customer display names in the format: "Customer [CUSTOMER_ID] from [CITY], [STATE]"

Handle NULLs:
- If city is NULL, use "Unknown City"
- If state is NULL, use "Unknown State"
- Customer ID should never be NULL

**Hint:** Use COALESCE for NULL handling and string concatenation with `||`.

---

### Question 3.3: Product Category Analysis
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Analyze products by category, ensuring NULL categories are handled:
- Category Name (use 'Uncategorized' if NULL)
- Number of products
- Average product weight (in grams)
- Average product dimensions (length + width + height)

**Expected Output:** One row per category (including 'Uncategorized').

**Hint:** Use COALESCE in GROUP BY and aggregate functions.

---

## Section 4: GROUP BY and Aggregations (ðŸŸ¡ Intermediate)

### Question 4.1: Order Status Summary
**Difficulty:** ðŸŸ¢ Beginner

**Problem:** Create a summary of orders by status:
- Order Status
- Count of orders
- Total revenue (sum of payment values)
- Average order value
- Minimum order value
- Maximum order value

Order by order count (descending).

**Hint:** Use GROUP BY with multiple aggregate functions.

---

### Question 4.2: Top Customers by Revenue
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Find the top 10 customers by total revenue:
- Customer ID
- Customer Unique ID
- Customer City
- Total number of orders
- Total revenue (sum of payment values)
- Average order value

Order by total revenue (descending).

**Hint:** Join customers with orders and order_payments, then aggregate and use LIMIT.

---

### Question 4.3: Monthly Sales Report
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Create a monthly sales report:
- Year
- Month
- Number of orders
- Number of unique customers
- Total revenue
- Average order value

Order by year and month.

**Hint:** Use EXTRACT() or DATE_PART() to get year and month from timestamps.

---

### Question 4.4: Product Category Performance
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Show product category performance:
- Category Name (English translation, or Portuguese if translation missing)
- Number of unique products
- Total quantity sold (sum of order items)
- Total revenue
- Average product price

Only include categories with revenue greater than 1000. Order by total revenue (descending).

**Hint:** Join products with order_items, use category translation, aggregate, and filter with HAVING.

---

### Question 4.5: Payment Type Analysis
**Difficulty:** ðŸŸ¢ Beginner

**Problem:** Analyze payment types:
- Payment Type
- Number of payments
- Total payment value
- Average payment value
- Number of unique orders

Order by total payment value (descending).

**Hint:** Aggregate from order_payments table.

---

## Section 5: Window Functions (ðŸŸ¡ Intermediate to ðŸ”´ Advanced)

### Question 5.1: Order Ranking by Customer
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Rank orders by value within each customer:
- Customer ID
- Order ID
- Order Purchase Timestamp
- Total Order Value (sum of payment values)
- Rank within customer (1 = highest value order)

Order by customer ID, then by rank.

**Hint:** Use RANK() or DENSE_RANK() with PARTITION BY customer_id.

---

### Question 5.2: Running Total by Date
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Calculate running total of revenue by order date:
- Order Purchase Timestamp (date only)
- Daily Revenue (sum of payment values for that day)
- Running Total (cumulative sum up to that date)

Order by date.

**Hint:** Use SUM() OVER() with ORDER BY date. You may need to group by date first.

---

### Question 5.3: Top Product per Category
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Find the top-selling product (by quantity) in each category:
- Category Name (English)
- Product ID
- Product Category Name (Portuguese)
- Total Quantity Sold
- Rank in Category

Show only rank 1 products.

**Hint:** Use RANK() or ROW_NUMBER() with PARTITION BY category, then filter with QUALIFY (Snowflake) or subquery.

---

### Question 5.4: Customer Order Sequence
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** For each customer, show their orders with:
- Customer ID
- Order ID
- Order Purchase Timestamp
- Order Sequence Number (1st order, 2nd order, etc. for that customer)
- Days Since Previous Order (NULL for first order)

Order by customer ID, then by order date.

**Hint:** Use ROW_NUMBER() for sequence and LAG() for previous order date.

---

### Question 5.5: Moving Average Revenue
**Difficulty:** ðŸ”´ Advanced

**Problem:** Calculate 7-day moving average of daily revenue:
- Date
- Daily Revenue
- 7-Day Moving Average (average of current day + 6 previous days)

Order by date.

**Hint:** Use AVG() OVER() with ROWS BETWEEN 6 PRECEDING AND CURRENT ROW.

---

### Question 5.6: Percent of Total Revenue
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Show each order with:
- Order ID
- Order Purchase Timestamp
- Order Revenue (sum of payment values)
- Percent of Total Revenue (what percentage this order represents of all orders)

Order by order revenue (descending).

**Hint:** Use SUM() OVER() without PARTITION BY to get grand total, then calculate percentage.

---

## Section 6: Common Table Expressions (CTEs) (ðŸŸ¡ Intermediate to ðŸ”´ Advanced)

### Question 6.1: Customer Lifetime Value
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Calculate customer lifetime value using CTEs:
- Customer ID
- Customer City
- First Order Date
- Last Order Date
- Total Orders
- Total Revenue
- Average Order Value
- Customer Tenure (days between first and last order)

**Hint:** Use CTE to calculate customer totals, then add date calculations.

---

### Question 6.2: Find Duplicate Orders
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Find duplicate orders (same customer, same date, same total payment value):
- Customer ID
- Order Purchase Timestamp (date only)
- Total Payment Value
- Order IDs (list all duplicate order IDs)
- Duplicate Count

**Hint:** Use CTE to identify duplicates, then join back to show all order IDs.

---

### Question 6.3: Product Sales Trend
**Difficulty:** ðŸ”´ Advanced

**Problem:** For each product, show monthly sales trend:
- Product ID
- Year-Month
- Quantity Sold in Month
- Previous Month Quantity
- Month-over-Month Change
- Running Total Quantity

**Hint:** Use multiple CTEs: first aggregate by product and month, then use window functions for trends.

---

### Question 6.4: Customer Segmentation
**Difficulty:** ðŸ”´ Advanced

**Problem:** Segment customers based on their order behavior:
- Customer ID
- Total Orders
- Total Revenue
- Average Order Value
- Days Since Last Order
- Customer Segment:
  - 'VIP' if total revenue >= 1000
  - 'Regular' if total revenue >= 500 and < 1000
  - 'Casual' if total revenue < 500
- Recency Status:
  - 'Active' if last order within 30 days
  - 'At Risk' if last order 30-90 days ago
  - 'Churned' if last order > 90 days ago

**Hint:** Use CTEs to calculate metrics, then use CASE statements for segmentation.

---

### Question 6.5: Top 3 Products per Category
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Find top 3 products by revenue in each category:
- Category Name (English)
- Product ID
- Total Revenue
- Rank in Category

Show only top 3 products per category.

**Hint:** Use CTE to calculate product revenue, then RANK() with PARTITION BY category, filter with QUALIFY.

---

## Section 7: Complex Multi-Step Queries (ðŸ”´ Advanced)

### Question 7.1: Order Fulfillment Analysis
**Difficulty:** ðŸ”´ Advanced

**Problem:** Analyze order fulfillment performance:
- Order Status
- Number of Orders
- Average Days to Approval (ORDER_APPROVED_AT - ORDER_PURCHASE_TIMESTAMP)
- Average Days to Carrier (ORDER_DELIVERED_CARRIER_DATE - ORDER_APPROVED_AT)
- Average Days to Customer (ORDER_DELIVERED_CUSTOMER_DATE - ORDER_DELIVERED_CARRIER_DATE)
- Average Total Delivery Time
- On-Time Delivery Rate (% delivered before ESTIMATED_DELIVERY_DATE)

**Hint:** Use CTEs to calculate time differences, then aggregate and calculate percentages.

---

### Question 7.2: Review Score Impact on Revenue
**Difficulty:** ðŸ”´ Advanced

**Problem:** Analyze the relationship between review scores and revenue:
- Review Score
- Number of Reviews
- Average Order Value (for orders with this review score)
- Total Revenue
- Percentage of Total Reviews

Order by review score.

**Hint:** Join order_reviews with orders and order_payments, aggregate by review score.

---

### Question 7.3: Geographic Sales Analysis
**Difficulty:** ðŸ”´ Advanced

**Problem:** Analyze sales by geographic region:
- State
- Number of Customers
- Number of Sellers
- Number of Orders
- Total Revenue
- Average Order Value
- Top Product Category (by revenue)

**Hint:** Use CTEs to join customers, sellers, orders, and products. Use window functions to find top category.

---

### Question 7.4: Payment Installment Analysis
**Difficulty:** ðŸ”´ Advanced

**Problem:** Analyze payment patterns:
- Payment Type
- Average Installments
- Total Number of Payments
- Total Payment Value
- Average Payment Value
- Percentage of Orders Using This Payment Type

**Hint:** Aggregate order_payments, then calculate percentages using window functions or subqueries.

---

### Question 7.5: Product Recommendation Score
**Difficulty:** ðŸ”´ Advanced

**Problem:** Create a product recommendation score based on:
- Product ID
- Category Name
- Total Quantity Sold
- Average Review Score (from order_reviews)
- Number of Reviews
- Revenue Rank in Category
- Final Score = (Quantity Sold * 0.4) + (Avg Review Score * 20 * 0.3) + ((Category Rank Score) * 0.3)

Show top 20 products by recommendation score.

**Hint:** Use multiple CTEs to calculate different components, then combine them.

---

## Section 8: Data Quality and Troubleshooting (ðŸŸ¡ Intermediate)

### Question 8.1: Data Quality Check
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Create a data quality report showing:
- Table Name
- Column Name
- Total Rows
- NULL Count
- NULL Percentage
- Data Quality Status (PASS if NULL% < 5%, WARNING if 5-10%, FAIL if > 10%)

Check critical columns: CUSTOMER_ID in orders, PRODUCT_ID in order_items, ORDER_ID in order_payments.

**Hint:** Use UNION ALL to combine results from multiple tables, use CASE for status.

---

### Question 8.2: Find Orphaned Records
**Difficulty:** ðŸŸ¡ Intermediate

**Problem:** Find orphaned records:
- Order Items without valid Order ID
- Order Payments without valid Order ID
- Order Reviews without valid Order ID
- Orders without valid Customer ID

Show the count of orphaned records for each scenario.

**Hint:** Use LEFT JOINs and filter for NULLs, or use NOT EXISTS.

---

### Question 8.3: Inconsistent Data Detection
**Difficulty:** ðŸ”´ Advanced

**Problem:** Detect data inconsistencies:
- Orders where total payment value doesn't match sum of order_items (price + freight_value)
- Orders with payment value but no order items
- Orders with order items but no payment
- Orders delivered before being approved

Show order ID and the type of inconsistency.

**Hint:** Use CTEs to calculate expected values, then compare with actual values using CASE statements.

---

## Framework for Solving SQL Problems

### Step 1: Understand the Problem
- What data do I need?
- What tables are involved?
- What relationships exist between tables?
- What is the expected output format?

### Step 2: Identify Required Operations
- Do I need JOINs? Which type?
- Do I need aggregations? (GROUP BY)
- Do I need window functions? (OVER)
- Do I need CTEs? (WITH)
- Do I need to handle NULLs? (COALESCE)

### Step 3: Plan the Query Structure
- Start with the base table(s)
- Add JOINs in logical order
- Apply filters (WHERE)
- Group if needed (GROUP BY)
- Apply window functions if needed
- Order results (ORDER BY)

### Step 4: Handle Edge Cases
- NULL values
- Empty result sets
- Duplicate records
- Data type conversions

### Step 5: Test and Refine
- Test with sample data
- Verify NULL handling
- Check aggregation logic
- Validate join conditions

---

## Tips for Success

1. **Start Simple:** Begin with basic SELECT, then add JOINs, then aggregations
2. **Test Incrementally:** Build your query step by step, testing each addition
3. **Handle NULLs Early:** Use COALESCE/NVL at the beginning of calculations
4. **Use CTEs for Complexity:** Break complex queries into logical CTEs
5. **Verify Join Types:** Ensure you're using the correct join type (INNER vs LEFT)
6. **Check Aggregations:** Make sure all non-aggregated columns are in GROUP BY
7. **Test with Sample Data:** Run queries on small subsets first
8. **Document Your Logic:** Add comments explaining complex parts

---

**Good luck with your SQL practice!**
