# Day 1, Block B: SQL Foundations with DuckDB

**Duration:** 30-40 minutes  
**Course:** ECBS5294 - Introduction to Data Science: Working with Data  
**Instructor:** Eduardo Ariño de la Rubia

---

## Learning Objectives

By the end of this session, you will be able to:

1. **Explain** why SQL is valuable for data analysis
2. **Write basic SQL queries** using SELECT, WHERE, and ORDER BY
3. **Filter data** using comparison operators, AND/OR logic, and pattern matching
4. **Handle NULL values** correctly (IS NULL, not = NULL)
5. **Create calculated columns** in query results
6. **Sort results** with ORDER BY
7. **Build complex queries** combining multiple concepts

---

## 0. Connecting to This Morning

Earlier today in **Block A**, we learned to make data **tidy**. We:
- Structured data properly (each variable a column, each observation a row)
- Fixed types (converted strings to numbers, dates)
- Handled missing values (NaN, NULL)
- Validated primary keys

**Now in Block B**, we **query** that tidy data with SQL.

> **Key insight:** Tidy data makes SQL queries straightforward. Messy data makes them painful.

This is why data cleaning comes first!

---

## 1. Why SQL?

### The Hook

> **"SQL lets you ask questions of data in the language of business logic, not programming loops."**

Think about it: Instead of writing loops and if-statements to find "all coffee sales over $5," you just write:

```sql
SELECT * FROM sales WHERE category = 'Coffee' AND price > 5
```

You **declare what you want**, not how to get it. The database figures out the most efficient way to get your answer.

### Why SQL Matters

**1. Declarative, Not Procedural**
- Say *what* you want, not *how* to get it
- Database optimizes the query for you
- Focus on business logic, not algorithms

**2. Set-Based Operations**
- Operate on entire sets of rows at once
- Not row-by-row loops (which are slow)
- Optimized for bulk operations

**3. Standard and Portable**
- SQL skills transfer across systems
- PostgreSQL, MySQL, BigQuery, Snowflake, DuckDB all use SQL
- Learn once, use everywhere

**4. Foundation for Advanced Concepts**
- Required for joins (Day 2)
- Common Table Expressions (CTEs)
- Window functions
- Subqueries

### Example: Python vs SQL

**Python (procedural):**
```python
total = 0
for row in data:
    if row['category'] == 'Coffee' and row['price'] > 5:
        total += row['price'] * row['quantity']
```

**SQL (declarative):**
```sql
SELECT SUM(price * quantity) AS total_revenue
FROM data
WHERE category = 'Coffee' AND price > 5
```

**Which is clearer? Which reads more like English?**

---

## 2. Why DuckDB?

### What is DuckDB?

DuckDB is an **embedded analytical database**. Think of it as "SQLite for analytics."

**Key Features:**
- **No server required** - runs in your Python process
- **Query files directly** - CSVs, Parquet, no loading step!
- **Fast** - columnar storage, optimized for analytics
- **Full SQL support** - including window functions, CTEs, complex joins
- **Works with pandas** - easy conversion to/from DataFrames

### When to Use DuckDB

✅ **Use DuckDB when:**
- Analyzing data locally (CSVs, Parquet files)
- Exploring datasets (millions of rows)
- Building data pipelines
- Learning SQL
- You need speed without server setup

❌ **Don't use DuckDB when:**
- You need a web application database (use PostgreSQL)
- You need multi-user concurrent writes (use PostgreSQL)
- You're building a production transactional system

### Why We're Teaching DuckDB

1. **Zero setup** - just `pip install duckdb`
2. **Same SQL** - skills transfer to PostgreSQL, BigQuery, etc.
3. **Real performance** - handles millions of rows on a laptop
4. **Modern tool** - used by data teams at companies like Mozilla, Datadog

---

## 3. Setup

Let's install DuckDB and get connected.

In [None]:
# Install DuckDB (run once)
!pip install duckdb -q

In [None]:
# Import libraries
import duckdb
import pandas as pd
import numpy as np
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

print("✅ Libraries imported successfully!")

### Connect to DuckDB

We'll create an **in-memory database** - it exists only while this notebook is running.

In [None]:
# Connect to in-memory database
con = duckdb.connect(':memory:')

print("✅ Connected to DuckDB!")
print(f"DuckDB version: {duckdb.__version__}")

### DuckDB's Superpower: Query Files Directly!

Most databases require you to:
1. Load data into the database
2. Then query it

DuckDB can **query CSV files directly** without loading them first. Let's try it:

In [None]:
# Query a CSV file directly!
result = con.execute("""
    SELECT * 
    FROM '../../data/day1/dirty_cafe_sales.csv' 
    LIMIT 5
""").df()  # .df() converts result to pandas DataFrame

result

**What just happened?**

1. We wrote SQL: `SELECT * FROM 'file.csv' LIMIT 5`
2. DuckDB read the CSV directly (no loading!)
3. We got back the first 5 rows
4. `.df()` converted it to a pandas DataFrame

This is **much faster** than `pd.read_csv()` for large files, especially when you only need some rows or columns.

---

## 4. Quick Data Prep: Types Matter!

For this notebook, we'll use the cafe sales dataset from Block A. 

**Critical Block A Callback:** Remember **Block A, Section 5 (Types & Pitfalls)**? How CSV files often load everything as text? We need to fix that with SQL!

*If you need a refresher, open `day1_block_a_tidy_foundations.ipynb` and jump to Section 5.*

Let's load the data and see what we're working with.

In [None]:
# First, let's see what DuckDB infers from the CSV
con.execute("DESCRIBE SELECT * FROM '../../data/day1/dirty_cafe_sales.csv'").df()

**Uh oh!** Everything is `VARCHAR` (text). This is exactly what we learned about in Block A!

**The problem:**
- Can't do math on text: `"5.00" + "3.00"` doesn't work!
- Can't sort properly: `"10"` comes before `"2"` alphabetically
- Can't compare: `WHERE "Price Per Unit" > 5` will fail

**The solution:** Cast columns to proper types when we load the data.

In [None]:
# Create a clean table with proper types
# We use TRY_CAST so bad values become NULL instead of causing errors
con.execute("""
    CREATE OR REPLACE TABLE cafe AS
    SELECT 
        "Transaction ID",
        Item,
        TRY_CAST(Quantity AS INTEGER) AS Quantity,
        TRY_CAST("Price Per Unit" AS DECIMAL(10,2)) AS "Price Per Unit",
        TRY_CAST("Total Spent" AS DECIMAL(10,2)) AS "Total Spent",
        "Payment Method",
        Location,
        TRY_CAST("Transaction Date" AS DATE) AS "Transaction Date"
    FROM '../../data/day1/dirty_cafe_sales.csv'
""")

print("✅ Table 'cafe' created with proper types!")

**What we did:**
- `TRY_CAST(column AS type)` - Convert column to specified type
- **INTEGER** for whole numbers (Quantity)
- **DECIMAL(10,2)** for money (10 total digits, 2 after decimal)
- **DATE** for dates

**Why TRY_CAST instead of CAST?**
- `CAST` errors if conversion fails
- `TRY_CAST` returns NULL if conversion fails
- Safer for dirty data!

Let's verify the types are fixed:

In [None]:
# Check the types of our clean table
con.execute("DESCRIBE cafe").df()

**Perfect!** Now:
- Quantity is INTEGER ✅
- "Price Per Unit" is DECIMAL ✅  
- "Transaction Date" is DATE ✅

We can now do math, comparisons, and sorting properly. This is what you learned in Block A - **types matter**!

---

In [None]:
# Check how many rows we have
row_count = con.execute("SELECT COUNT(*) as total FROM cafe").df()
print(f"Total rows: {row_count['total'][0]:,}")

In [None]:
# Check column names
con.execute("SELECT * FROM cafe LIMIT 3").df()

**Note:** This data is "dirty" - it has missing values, inconsistent formats, etc. That's intentional! Real data is messy. In Block A, you learned how to clean it. Now we'll learn how to query it.

---

## 5. SELECT Basics

### The SELECT Statement

> **SELECT tells SQL what columns to return**

Basic structure:
```sql
SELECT column1, column2, column3
FROM table_name
```

Let's start simple:

### Example 1: Select Specific Columns

**Business question:** "Show me just the item names and prices."

In [None]:
con.execute("""
    SELECT 
        Item,
        "Price Per Unit" FROM cafe
    LIMIT 10
""").df()

**Note:** We used `LIMIT 10` to get just 10 rows. This is great for exploring data quickly.

### Example 2: Select Multiple Columns

**Business question:** "Show me transaction details - what was ordered, how much, and when."

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        Quantity,
        "Price Per Unit",
        "Transaction Date"
    FROM cafe
    LIMIT 10
""").df()

### Example 3: SELECT * (All Columns)

The `*` wildcard means "all columns."

In [None]:
con.execute("""
    SELECT * 
    FROM cafe
    LIMIT 5
""").df()

**When to use SELECT *:**
- ✅ Exploring data ("What columns do I have?")
- ✅ Development/prototyping
- ❌ Production code (slow, unclear what you're getting)
- ❌ Large tables (returns too much data)

**Best practice:** Be explicit about columns in production queries.

### Example 4: Column Aliases with AS

Rename columns in your results using `AS`:

In [None]:
con.execute("""
    SELECT 
        "Transaction ID" AS txn_id,
        Item AS product_name,
        "Price Per Unit" AS unit_price
    FROM cafe
    LIMIT 10
""").df()

**Why use aliases?**
- Shorter names for convenience
- More descriptive names
- Rename calculated columns (we'll see this soon)

---

## 6. WHERE: Filtering Rows

> **WHERE filters which rows to return**

Think of it as: "Show me transactions **where** [condition is true]"

Structure:
```sql
SELECT columns
FROM table
WHERE condition
```

### Comparison Operators

| Operator | Meaning |
|----------|--------|
| `=` | Equal to |
| `!=` or `<>` | Not equal to |
| `>` | Greater than |
| `<` | Less than |
| `>=` | Greater than or equal |
| `<=` | Less than or equal |

### Example 1: Filter by Price

**Business question:** "Show me all premium items ($4 or more)."

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit"
    FROM cafe
    WHERE "Price Per Unit" >= 4.00
    LIMIT 10
""").df()

### Example 2: Filter by Category

**Business question:** "Show me only coffee sales."

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit" FROM cafe
    WHERE Item = 'Coffee'
    LIMIT 10
""").df()

**Note:** Text values need **single quotes**: `'Coffee'`, not `Coffee`

### Example 3: Multiple Conditions with AND

**Business question:** "Show me salad items priced at $4 or more."

**Both conditions must be true:**

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit"
    FROM cafe
    WHERE Item = 'Salad' 
      AND "Price Per Unit" >= 4.00
    LIMIT 10
""").df()

### Example 4: Multiple Conditions with OR

**Business question:** "Show me coffee OR tea sales."

**Either condition can be true:**

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit" FROM cafe
    WHERE Item = 'Coffee' 
       OR Item = 'Tea'
    LIMIT 10
""").df()

### Example 5: IN Operator (Cleaner OR)

Instead of multiple ORs, use `IN`:

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit" FROM cafe
    WHERE Item IN ('Coffee', 'Tea', 'Espresso')
    LIMIT 10
""").df()

**Much cleaner!** Especially with many values.

### Example 6: BETWEEN for Ranges

**Business question:** "Show me transactions between $3 and $7."

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit" FROM cafe
    WHERE "Price Per Unit" BETWEEN 3.00 AND 7.00
    LIMIT 10
""").df()

---

### ⏸️ Pause and Try!

**Your task:** Write a query from scratch to find transactions where:
1. Price is between **$3 and $6** (use `BETWEEN`)
2. Item is **NOT 'Coffee'** (use `!=` or `<>`)

**Requirements:**
- Use the `cafe` table
- Combine both conditions with `AND`
- Show: Transaction ID, Item, and Price Per Unit
- Limit to 10 results

Replace the placeholder query in the cell below with your complete SQL query.

In [None]:
# Your turn! Write your query here:
con.execute("SELECT 1 AS todo").df()  # Replace this entire query with your answer

---

### ⏸️ Pause and Try!

Before moving on, modify the query above to find transactions where:
- Price is between **$3 and $6** (use BETWEEN)
- Item is **NOT 'Coffee'** (use `!=` or `<>`)

Try it in the cell below before continuing!

**Hint:** You'll need `AND` to combine both conditions.

### Example 7: LIKE for Pattern Matching

**Business question:** "Find all items with 'Salad' in the name."

**Wildcards:**
- `%` = any number of characters
- `_` = exactly one character

In [None]:
con.execute("""
    SELECT DISTINCT
        Item
    FROM cafe
    WHERE Item LIKE '%Salad%'
    LIMIT 10
""").df()

**Pattern examples:**
- `'%Latte%'` - Contains "Latte" anywhere
- `'Latte%'` - Starts with "Latte"
- `'%Latte'` - Ends with "Latte"
- `'_offee'` - Exactly 6 characters, ends with "offee"

### Example 8: CRITICAL - NULL Handling

> **🚨 MOST IMPORTANT CONCEPT: NULL is special**

**NULL means "unknown" or "missing." It is NOT:**
- Zero
- Empty string
- False

**THE #1 MISTAKE: Using = NULL**

In [None]:
# ❌ WRONG - This doesn't work!
con.execute("""
    SELECT COUNT(*) as count
    FROM cafe
    WHERE "Payment Method" = NULL
""").df()

**Result: 0 rows.** This is WRONG! We have NULL values, but `= NULL` doesn't find them.

**Why?** In SQL, NULL is not equal to anything, not even NULL!

In [None]:
# ✅ CORRECT - Use IS NULL
con.execute("""
    SELECT COUNT(*) as count
    FROM cafe
    WHERE "Payment Method" IS NULL
""").df()

**Much better!** Now we found the missing payment methods.

In [None]:
# Check for non-NULL values
con.execute("""
    SELECT COUNT(*) as count
    FROM cafe
    WHERE "Payment Method" IS NOT NULL
""").df()

**NULL in comparisons:**

In [None]:
# WHERE "Price Per Unit" > 0 automatically excludes NULLs
result_with_nulls = con.execute("""
    SELECT COUNT(*) as total_rows FROM cafe WHERE "Price Per Unit" IS NOT NULL
""").df()

result_filtered = con.execute("""
    SELECT COUNT(*) as filtered_rows FROM cafe WHERE "Price Per Unit" > 0
""").df()

print(f"Total non-NULL prices: {result_with_nulls['total_rows'][0]:,}")
print(f"Prices > 0: {result_filtered['filtered_rows'][0]:,}")
print("\n⚠️  Comparison operators automatically exclude NULLs!")

**Key Takeaways:**
1. ❌ **NEVER use `= NULL`** - it doesn't work
2. ✅ **Use `IS NULL`** to find NULL values
3. ✅ **Use `IS NOT NULL`** to exclude NULL values
4. ⚠️ Comparison operators (`>`, `<`, `=`) automatically exclude NULLs

---

## 7. Calculated Columns

> **Create new columns in your query results**

You can perform calculations and create new columns on-the-fly.

### Example 1: Basic Calculation

**Business question:** "What's the total revenue per transaction?"

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit",
        Quantity,
        "Price Per Unit"* Quantity AS total_revenue
    FROM cafe
    LIMIT 10
""").df()

**Note:** The `AS total_revenue` gives our calculation a name. Without it, the column would be called something ugly like `Price * Quantity`.

### Example 2: Multiple Calculations

**Business question:** "Show revenue and estimated tax per transaction."

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit",
        Quantity,
        "Price Per Unit"* Quantity AS revenue,
        "Price Per Unit"* Quantity * 0.10 AS tax,
        "Price Per Unit"* Quantity * 1.10 AS total_with_tax
    FROM cafe
    WHERE "Price Per Unit"IS NOT NULL AND Quantity IS NOT NULL
    LIMIT 10
""").df()

### Example 3: Using ROUND for Clean Output

**Clean up decimal places:**

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit",
        Quantity,
        ROUND("Price Per Unit"* Quantity, 2) AS revenue,
        ROUND("Price Per Unit"* Quantity * 0.10, 2) AS tax
    FROM cafe
    WHERE "Price Per Unit"IS NOT NULL AND Quantity IS NOT NULL
    LIMIT 10
""").df()

**Common calculations:**
- Arithmetic: `+`, `-`, `*`, `/`
- Rounding: `ROUND(value, decimals)`
- Absolute value: `ABS(value)`
- String concatenation: `||` (e.g., `first_name || ' ' || last_name`)

---

## 8. ORDER BY: Sorting Results

> **ORDER BY sorts your results**

Structure:
```sql
SELECT columns
FROM table
WHERE conditions
ORDER BY column [ASC|DESC]
```

**Important:** `ORDER BY` happens **last** - after filtering, after calculations.

### Example 1: Sort Ascending (Default)

**Business question:** "Show me items sorted by price, lowest first."

In [None]:
con.execute("""
    SELECT 
        Item,
        "Price Per Unit" FROM cafe
    WHERE "Price Per Unit"IS NOT NULL
    ORDER BY "Price Per Unit" ASC
    LIMIT 10
""").df()

**Note:** `ASC` is optional - it's the default. This is the same:
```sql
ORDER BY Price
```

### Example 2: Sort Descending

**Business question:** "Show me the most expensive items first."

In [None]:
con.execute("""
    SELECT 
        Item,
        "Price Per Unit" FROM cafe
    WHERE "Price Per Unit"IS NOT NULL
    ORDER BY "Price Per Unit" DESC
    LIMIT 10
""").df()

### Example 3: Sort by Multiple Columns

**Business question:** "Sort by payment method, then by price within each payment method."

In [None]:
con.execute("""
    SELECT 
        "Payment Method",
        Item,
        "Price Per Unit" FROM cafe
    WHERE "Price Per Unit"IS NOT NULL AND "Payment Method" IS NOT NULL
    ORDER BY "Payment Method" ASC, "Price Per Unit" DESC
    LIMIT 20
""").df()

**How it works:**
1. First, sort by `Payment_Method` (A to Z)
2. Within each payment method, sort by `Price` (high to low)

**Use case:** "Top products in each category"

### Example 4: Sort by Calculated Column

**Business question:** "Show highest revenue transactions first."

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit",
        Quantity,
        "Price Per Unit"* Quantity AS revenue
    FROM cafe
    WHERE "Price Per Unit"IS NOT NULL AND Quantity IS NOT NULL
    ORDER BY revenue DESC
    LIMIT 10
""").df()

**Note:** You can reference the alias (`revenue`) in ORDER BY, even though the calculation happens in SELECT!

---

## 9. Putting It All Together

Let's combine everything we've learned into more complex queries.

### Example 1: Complex Business Query

**Business question:** 
> "Show me all salad sales priced at $4 or more, paid by credit card, with revenue calculated, sorted by revenue descending."

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Price Per Unit",
        Quantity,
        "Payment Method",
        ROUND("Price Per Unit" * Quantity, 2) AS revenue
    FROM cafe
    WHERE Item = 'Salad'
      AND "Price Per Unit" >= 4.00
      AND "Payment Method" = 'Credit Card'
      AND "Price Per Unit" IS NOT NULL
      AND Quantity IS NOT NULL
    ORDER BY revenue DESC
    LIMIT 20
""").df()

**Breaking it down:**
1. **SELECT** - Choose columns and calculate revenue
2. **FROM** - Which table
3. **WHERE** - Filter to salad, price >= $4, credit card, non-NULLs
4. **ORDER BY** - Sort by revenue, highest first
5. **LIMIT** - Show top 20

This reads almost like English!

### Example 2: Find High-Value Transactions

**Business question:**
> "What are our largest transactions (revenue $20+)? Show item, revenue, and payment method."

In [None]:
con.execute("""
    SELECT 
        "Transaction ID",
        Item,
        "Payment Method",
        ROUND("Price Per Unit" * Quantity, 2) AS revenue
    FROM cafe
    WHERE "Price Per Unit" * Quantity >= 20
      AND "Price Per Unit" IS NOT NULL
      AND Quantity IS NOT NULL
    ORDER BY revenue DESC
    LIMIT 15
""").df()

**Note:** We can use calculations in WHERE clause! `WHERE "Price Per Unit" * Quantity >= 20`

## 10. Query Building Strategy

When writing complex queries, build them incrementally:

### Step 1: Start Simple
```sql
SELECT * FROM cafe LIMIT 5
```

### Step 2: Add Specific Columns
```sql
SELECT Item, Price, Quantity FROM cafe LIMIT 5
```

### Step 3: Add Filtering
```sql
SELECT Item, Price, Quantity 
FROM cafe 
WHERE Price > 5
LIMIT 5
```

### Step 4: Add Calculations
```sql
SELECT Item, Price, Quantity, Price * Quantity AS revenue
FROM cafe 
WHERE Price > 5
LIMIT 5
```

### Step 5: Add Sorting
```sql
SELECT Item, Price, Quantity, Price * Quantity AS revenue
FROM cafe 
WHERE Price > 5
ORDER BY revenue DESC
LIMIT 5
```

### Step 6: Remove LIMIT (or increase it)
```sql
SELECT Item, Price, Quantity, Price * Quantity AS revenue
FROM cafe 
WHERE Price > 5
ORDER BY revenue DESC
```

**Key tip:** Always use `LIMIT` while developing. Remove it only when you're ready for full results.

---

## 11. Common Mistakes Summary

### ❌ Mistake 1: Using = NULL
```sql
WHERE column = NULL  -- ❌ Doesn't work!
WHERE column IS NULL -- ✅ Correct
```

### ❌ Mistake 2: Forgetting Quotes on Strings
```sql
WHERE Item = Coffee     -- ❌ Error
WHERE Item = 'Coffee'   -- ✅ Correct
```

### ❌ Mistake 3: Confusing AND vs OR
```sql
-- "Coffee AND Tea" = rows that are BOTH (impossible!)
WHERE Item = 'Coffee' AND Item = 'Tea'  -- ❌ Returns nothing

-- "Coffee OR Tea" = rows that are EITHER
WHERE Item = 'Coffee' OR Item = 'Tea'   -- ✅ Correct
-- Or better:
WHERE Item IN ('Coffee', 'Tea')         -- ✅ Even better
```

### ❌ Mistake 4: Missing Parentheses in Complex Conditions
```sql
-- This might not mean what you think:
WHERE Price > 5 AND Item = 'Coffee' OR Item = 'Tea'
-- AND binds tighter than OR!

-- Use parentheses for clarity:
WHERE Price > 5 AND (Item = 'Coffee' OR Item = 'Tea')
```

### ❌ Mistake 5: Forgetting ORDER BY is Optional
Without `ORDER BY`, results are in **arbitrary order**. Don't assume any default sorting!

---

## 12. Summary: What We Learned

### SQL Fundamentals
1. **Why SQL?** - Declarative, set-based, standard, optimized
2. **Why DuckDB?** - No server, query files directly, fast, modern

### Core Concepts
3. **SELECT** - Choose which columns to return
4. **WHERE** - Filter which rows to return
   - Comparison operators: `=`, `>`, `<`, `>=`, `<=`, `!=`
   - Logical operators: `AND`, `OR`, `NOT`
   - Special operators: `IN`, `BETWEEN`, `LIKE`
   - **NULL handling:** `IS NULL`, `IS NOT NULL` (never `= NULL`!)
5. **Calculated columns** - Create new columns with expressions
6. **ORDER BY** - Sort results (ASC or DESC)

### Query Building
7. Build incrementally: Start simple, add complexity
8. Use `LIMIT` while developing
9. Check for NULLs in WHERE clauses

### Most Important Takeaway
> **🚨 NEVER use `= NULL` - Always use `IS NULL` or `IS NOT NULL`**

This is the #1 mistake beginners make. Don't be that person!

---

## Next: Aggregations & GROUP BY

In the next notebook, we'll learn how to:
- **Summarize data** with COUNT, SUM, AVG, MIN, MAX
- **Group data** with GROUP BY
- **Filter groups** with HAVING
- Understand the critical difference between **WHERE and HAVING**

You'll go from "Show me transactions" to "Show me total revenue by category" - that's when SQL gets really powerful!

**Ready?** Open `day1_block_b_02_aggregations.ipynb`

---

**Great work!** 🎉 You've learned the foundations of SQL. Practice these concepts - they're the building blocks for everything else.