# 2f Window Functions: The Power of Row-Level Calculations

## Introduction

**Window functions are one of the most powerful yet misunderstood features in SQL.**

Unlike regular aggregate functions (SUM, AVG, COUNT) that collapse rows into a single result, **window functions perform calculations across a set of rows while keeping all individual rows visible**.

**The Key Difference:**
- **Regular Aggregation:** `SUM(sales)` ‚Üí Returns ONE row with total
- **Window Function:** `SUM(sales) OVER()` ‚Üí Returns ALL rows, each with the total

**Why Window Functions Matter:**
- üìä Calculate running totals, moving averages, and rankings
- üìà Compare individual rows to group averages
- üéØ Identify top performers within each category
- üìâ Calculate percentiles and cumulative distributions
- üîç Answer "what percentage of total does this represent?" questions

**Database:** This course uses **Snowflake** database. All examples are Snowflake-compatible.

**What you'll learn:**
- Understanding the difference between aggregation and window functions
- The intuitive framework to remember window function syntax
- ROW_NUMBER(), RANK(), DENSE_RANK() for rankings
- SUM(), AVG(), COUNT() as window functions
- PARTITION BY - grouping within window functions
- ORDER BY - ordering within window functions
- ROWS BETWEEN - defining window frames
- Real-world scenarios with simple, clear examples

---

## The Intuitive Framework: Remember Window Functions in 3 Parts

Think of window functions like **"looking through a window"** at your data:

```
FUNCTION() OVER(
    PARTITION BY column    -- "Which window?" (grouping)
    ORDER BY column        -- "How to arrange?" (sorting)
    ROWS BETWEEN ...       -- "How many rows?" (frame)
)
```

**Memory Framework:**
1. **FUNCTION()** - What calculation? (SUM, AVG, ROW_NUMBER, etc.)
2. **PARTITION BY** - Which groups? (like GROUP BY, but keeps all rows)
3. **ORDER BY** - How to sort within each partition?
4. **ROWS BETWEEN** - Which rows to include? (optional, for running calculations)

---

## Dataset Setup: Employee Sales Data

Let's create a simple table with employee sales data to understand window functions intuitively.


In [None]:
-- Create employee_sales table
CREATE OR REPLACE TABLE employee_sales (
    employee_id INT,
    employee_name VARCHAR(50),
    department VARCHAR(50),
    sale_date DATE,
    sales_amount DECIMAL(10, 2)
);


In [None]:
-- Insert dummy data
INSERT INTO employee_sales VALUES
    (1, 'Alice', 'Sales', '2024-01-15', 5000),
    (2, 'Bob', 'Sales', '2024-01-20', 3000),
    (3, 'Charlie', 'Sales', '2024-01-25', 7000),
    (4, 'Alice', 'Sales', '2024-02-10', 6000),
    (5, 'Bob', 'Sales', '2024-02-15', 4000),
    (6, 'Charlie', 'Sales', '2024-02-20', 8000),
    (7, 'Diana', 'Marketing', '2024-01-18', 2000),
    (8, 'Eve', 'Marketing', '2024-01-22', 3500),
    (9, 'Diana', 'Marketing', '2024-02-12', 2500),
    (10, 'Eve', 'Marketing', '2024-02-18', 4500),
    (11, 'Frank', 'IT', '2024-01-30', 1000),
    (12, 'Grace', 'IT', '2024-02-05', 1500),
    (13, 'Frank', 'IT', '2024-02-25', 1200);


In [None]:
-- View the data
SELECT * FROM employee_sales ORDER BY department, sale_date;


---

## Part 1: Regular Aggregation vs Window Functions

**The Fundamental Difference:**

When you use `SUM()` with `GROUP BY`, you get **one row per group**:


In [None]:
-- Regular aggregation: Collapses rows
-- Returns ONE row per department
SELECT 
    department,
    SUM(sales_amount) as total_sales
FROM employee_sales
GROUP BY department;


When you use `SUM() OVER()`, you get **all rows** with the total added to each:


In [None]:
-- Window function: Keeps all rows, adds calculation
-- Returns ALL rows, each with the total
SELECT 
    employee_name,
    department,
    sales_amount,
    SUM(sales_amount) OVER() as grand_total
FROM employee_sales
ORDER BY department, employee_name;


**Key Insight:** 
- `GROUP BY` ‚Üí **Collapses** rows (fewer rows returned)
- `OVER()` ‚Üí **Preserves** rows (same number of rows returned)

---

## Part 2: PARTITION BY - The "Which Window?" Clause

**PARTITION BY** is like `GROUP BY`, but it **keeps all rows** instead of collapsing them.

**Think of it as:** "Calculate this function separately for each group, but show me all the rows"

### Example: Total Sales Per Department (Keeping All Rows)


In [None]:
-- Calculate total sales per department, but keep all rows
SELECT 
    employee_name,
    department,
    sales_amount,
    SUM(sales_amount) OVER(PARTITION BY department) as dept_total
FROM employee_sales
ORDER BY department, sales_amount DESC;


**What happened:**
- Each row shows its individual `sales_amount`
- Each row also shows the `dept_total` (sum of all sales in that department)
- All rows are preserved (no collapsing)

**Use Case:** "I want to see each sale AND know what the department total is"

### Example: Average Sales Per Department


In [None]:
-- Compare each sale to department average
SELECT 
    employee_name,
    department,
    sales_amount,
    AVG(sales_amount) OVER(PARTITION BY department) as dept_avg,
    sales_amount - AVG(sales_amount) OVER(PARTITION BY department) as diff_from_avg
FROM employee_sales
ORDER BY department, sales_amount DESC;


**Use Case:** "Which employees are above/below their department average?"

---

## Part 3: ORDER BY in Window Functions - The "How to Arrange?" Clause

**ORDER BY** within `OVER()` does two things:
1. **For ranking functions** (ROW_NUMBER, RANK): Determines the order of ranking
2. **For aggregate functions** (SUM, AVG): Creates a **running calculation**

### Example 1: Ranking Sales Within Each Department


In [None]:
-- Rank employees by sales within each department
SELECT 
    employee_name,
    department,
    sales_amount,
    ROW_NUMBER() OVER(PARTITION BY department ORDER BY sales_amount DESC) as rank_in_dept
FROM employee_sales
ORDER BY department, rank_in_dept;


**What happened:**
- `ROW_NUMBER()` assigns 1, 2, 3... based on `ORDER BY sales_amount DESC`
- `PARTITION BY department` means ranking restarts for each department
- Each row gets a rank while keeping all rows visible

### Example 2: Running Total (Cumulative Sum)

When you use `ORDER BY` with aggregate functions, you get **running calculations**:


In [None]:
-- Calculate running total of sales over time
SELECT 
    employee_name,
    sale_date,
    sales_amount,
    SUM(sales_amount) OVER(ORDER BY sale_date) as running_total
FROM employee_sales
ORDER BY sale_date;


**What happened:**
- Each row shows the **cumulative sum** up to that point
- `ORDER BY sale_date` determines the order of accumulation
- First row: just its own amount
- Second row: first + second amount
- Third row: first + second + third amount
- And so on...

### Example 3: Running Total Per Department


In [None]:
-- Running total within each department
SELECT 
    employee_name,
    department,
    sale_date,
    sales_amount,
    SUM(sales_amount) OVER(
        PARTITION BY department 
        ORDER BY sale_date
    ) as running_total_in_dept
FROM employee_sales
ORDER BY department, sale_date;


**What happened:**
- Running total **restarts** for each department (PARTITION BY)
- Within each department, it accumulates chronologically (ORDER BY sale_date)

---

## Part 4: Common Window Functions

### ROW_NUMBER() - Sequential Numbering

Assigns a unique sequential number (1, 2, 3...) to each row within the partition.


In [None]:
-- Number each sale within each department
SELECT 
    employee_name,
    department,
    sales_amount,
    ROW_NUMBER() OVER(PARTITION BY department ORDER BY sales_amount DESC) as row_num
FROM employee_sales
ORDER BY department, row_num;


**Key Point:** ROW_NUMBER() always gives unique numbers, even if values are tied.

### RANK() - Ranking with Gaps

Assigns rank with gaps when there are ties.


In [None]:
-- Rank sales within each department
-- If two employees have same sales, they get same rank, next rank skips
SELECT 
    employee_name,
    department,
    sales_amount,
    RANK() OVER(PARTITION BY department ORDER BY sales_amount DESC) as rank_with_gaps
FROM employee_sales
ORDER BY department, rank_with_gaps;


**Example:** If two employees tie for rank 1, the next one gets rank 3 (not rank 2).

### DENSE_RANK() - Ranking without Gaps

Assigns rank without gaps when there are ties.


In [None]:
-- Rank sales without gaps
-- If two employees have same sales, they get same rank, next rank continues
SELECT 
    employee_name,
    department,
    sales_amount,
    DENSE_RANK() OVER(PARTITION BY department ORDER BY sales_amount DESC) as rank_no_gaps
FROM employee_sales
ORDER BY department, rank_no_gaps;


**Example:** If two employees tie for rank 1, the next one gets rank 2 (no gap).

**Comparison:**
- **ROW_NUMBER()**: 1, 2, 3, 4... (always unique, no ties)
- **RANK()**: 1, 1, 3, 4... (ties allowed, gaps after ties)
- **DENSE_RANK()**: 1, 1, 2, 3... (ties allowed, no gaps)

---

## Part 5: Window Frames - ROWS BETWEEN

**Window frames** let you control **which rows** are included in the calculation.

**Common Frame Specifications:**
- `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` - All rows from start to current (default with ORDER BY)
- `ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING` - Previous, current, and next row
- `ROWS BETWEEN 2 PRECEDING AND CURRENT ROW` - Last 3 rows including current

### Example: Moving Average (3-Day Window)


In [None]:
-- Calculate 3-day moving average
-- For each row, average of current row + 2 previous rows
SELECT 
    employee_name,
    sale_date,
    sales_amount,
    AVG(sales_amount) OVER(
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as moving_avg_3_days
FROM employee_sales
WHERE department = 'Sales'
ORDER BY sale_date;


**What happened:**
- Row 1: Average of row 1 only (not enough previous rows)
- Row 2: Average of rows 1-2
- Row 3: Average of rows 1-3
- Row 4: Average of rows 2-4 (drops row 1, adds row 4)
- And so on...

### Example: Compare to Previous Row


In [None]:
-- Compare each sale to the previous sale
SELECT 
    employee_name,
    sale_date,
    sales_amount,
    LAG(sales_amount, 1) OVER(ORDER BY sale_date) as previous_sale,
    sales_amount - LAG(sales_amount, 1) OVER(ORDER BY sale_date) as change_from_previous
FROM employee_sales
WHERE department = 'Sales'
ORDER BY sale_date;


**LAG()** - Gets value from previous row  
**LEAD()** - Gets value from next row

---

## Part 6: Practical Examples

### Example 1: Top 2 Performers Per Department


In [None]:
-- Find top 2 sales per department
SELECT 
    employee_name,
    department,
    sales_amount,
    RANK() OVER(PARTITION BY department ORDER BY sales_amount DESC) as dept_rank
FROM employee_sales
QUALIFY RANK() OVER(PARTITION BY department ORDER BY sales_amount DESC) <= 2
ORDER BY department, dept_rank;


**Note:** `QUALIFY` is Snowflake's way to filter based on window function results (like HAVING for window functions).

### Example 2: Percentage of Department Total


In [None]:
-- What percentage of department total does each sale represent?
SELECT 
    employee_name,
    department,
    sales_amount,
    SUM(sales_amount) OVER(PARTITION BY department) as dept_total,
    ROUND(
        (sales_amount * 100.0 / SUM(sales_amount) OVER(PARTITION BY department)), 
        2
    ) as pct_of_dept_total
FROM employee_sales
ORDER BY department, sales_amount DESC;


### Example 3: Running Total with Reset Per Employee


In [None]:
-- Running total of sales per employee over time
SELECT 
    employee_name,
    sale_date,
    sales_amount,
    SUM(sales_amount) OVER(
        PARTITION BY employee_name 
        ORDER BY sale_date
    ) as employee_running_total
FROM employee_sales
ORDER BY employee_name, sale_date;


---

## Quick Reference: Window Function Syntax Framework

**Remember this pattern:**

```sql
FUNCTION() OVER(
    PARTITION BY column1, column2    -- Optional: Grouping
    ORDER BY column3                  -- Optional: Sorting
    ROWS BETWEEN ...                 -- Optional: Frame
)
```

**Common Functions:**
- **Ranking:** `ROW_NUMBER()`, `RANK()`, `DENSE_RANK()`
- **Aggregates:** `SUM()`, `AVG()`, `COUNT()`, `MIN()`, `MAX()`
- **Navigation:** `LAG()`, `LEAD()`, `FIRST_VALUE()`, `LAST_VALUE()`

**Common Patterns:**

1. **Total per group (all rows):**
   ```sql
   SUM(amount) OVER(PARTITION BY group)
   ```

2. **Ranking within group:**
   ```sql
   RANK() OVER(PARTITION BY group ORDER BY value DESC)
   ```

3. **Running total:**
   ```sql
   SUM(amount) OVER(ORDER BY date)
   ```

4. **Running total per group:**
   ```sql
   SUM(amount) OVER(PARTITION BY group ORDER BY date)
   ```

5. **Moving average:**
   ```sql
   AVG(amount) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
   ```

---

## Key Takeaways

‚úÖ **Window functions preserve all rows** (unlike GROUP BY which collapses rows)  
‚úÖ **PARTITION BY** = grouping (like GROUP BY, but keeps all rows)  
‚úÖ **ORDER BY** = sorting (creates running calculations with aggregates)  
‚úÖ **ROWS BETWEEN** = frame specification (which rows to include)  
‚úÖ **Use window functions when** you need row-level calculations based on groups or ordered sequences

**When to use window functions:**
- üìä Rankings and percentiles
- üìà Running totals and moving averages
- üîç Comparing rows to group statistics
- üìâ Cumulative calculations
- üéØ Top N per group queries

---

## Practice Exercise

Try these on your own:

1. Calculate the difference between each sale and the department average
2. Find employees whose sales are in the top 25% of their department
3. Calculate a 7-day moving average of sales
4. Show each sale with the previous and next sale amounts
5. Calculate what percentage each sale represents of the grand total
