#Table

In [0]:
CREATE or replace TABLE data_engineering_practice.sql.employees (
    emp_id INT,
    emp_name VARCHAR(50),
    department VARCHAR(50),
    salary INT
);

INSERT INTO data_engineering_practice.sql.employees VALUES
(1, 'Amit',   'HR',      40000),
(2, 'Neha',   'HR',      50000),
(3, 'Ritu',   'HR',      50000),
(4, 'Raj',    'IT',      60000),
(5, 'Simran', 'IT',      70000),
(6, 'Vikram', 'IT',      70000),
(7, 'Arjun',  'Finance', 45000),
(8, 'Meena',  'Finance', 55000),
(9, 'Kabir',  'Finance', 55000);



### 1. ROW_NUMBER()

Gives a unique number to each row within a group based on order. Even if two rows tie, they still get different numbers.

In [0]:
SELECT emp_name, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM data_engineering_practice.sql.employees;

/*
>>In each department, employees are sorted by salary (highest → lowest).
>>Then they are numbered 1, 2, 3 … without skipping.
>>Useful when you want to pick the top-N employees in each department.


### 2. RANK()

Similar to ROW_NUMBER(), but if two rows tie, they get the same rank. The next rank jumps.

In [0]:
SELECT emp_name, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_pos
FROM data_engineering_practice.sql.employees;

/*
>>If two employees have the same salary, both are rank 1.
>>The next employee will get rank 3, not 2.
>>Useful in contests or leaderboards where ties share the same position.

### 3. DENSE_RANK()

Like RANK(), but it doesn’t skip numbers after ties.

In [0]:
SELECT emp_name, department, salary,
       DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank_pos
FROM data_engineering_practice.sql.employees;

/*
>>Neha & Ritu both rank 1, Amit becomes rank 2 (not 3).
>>Keeps numbers compact.
>>Often used in reporting when you don’t want gaps in rank.


### 4. NTILE(n)

Splits rows into n roughly equal groups (buckets).

In [0]:
SELECT emp_name, salary,
       NTILE(3) OVER (ORDER BY salary DESC) AS bucket
FROM data_engineering_practice.sql.employees;


### 5. LAG()

Fetches the previous row’s value.

In [0]:
SELECT emp_name, salary,
       LAG(salary) OVER (ORDER BY salary) AS prev_salary
FROM data_engineering_practice.sql.employees;

/*
>>For each employee, shows what the previous employee’s salary was.
>>First row has NULL because no one is before it.
>>Useful for calculating differences between rows (e.g. salary increase from previous).*/


### 6. LEAD()

Fetches the next row’s value.

In [0]:
SELECT emp_name, salary,
       LEAD(salary) OVER (ORDER BY salary) AS next_salary
FROM data_engineering_practice.sql.employees;

/*
>>For each employee, shows what the next employee’s salary is.
>>Last row has NULL because no one is after it.
>>Useful for comparing current vs. next record.*/


### 7. SUM() / AVG() with OVER

Aggregate functions over a partition, but rows are not collapsed.

In [0]:
SELECT emp_name, department, salary,
       SUM(salary) OVER (PARTITION BY department) AS dept_total,
       AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM data_engineering_practice.sql.employees;

/*
>>For every employee, you see their department’s total & average salary.
>>Unlike GROUP BY, you don’t lose row-level detail.
>>Perfect for showing both employee-level and group-level info in one query.*/


### 8. Running Total (Cumulative SUM)

Keeps adding row by row, in order.

In [0]:
SELECT emp_name, salary,
       SUM(salary) OVER (ORDER BY salary
                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM data_engineering_practice.sql.employees;

/*
>>Salaries are ordered (lowest → highest).
>>Running total is computed row by row.
>>Useful for things like cumulative sales or progressive totals.

>>UNBOUNDED PRECEDING = from the very first row of the partition (the beginning).
>>CURRENT ROW = up to the row you are currently on.
>>So together:
“Start from the first row and include everything up to the current row.”*/


### Custom Code:

Want to show bucket names instead of numbers in NTILE(n)

In [0]:
/*
Show as group 1, group 2, group 3.. etc
*/
SELECT emp_name,
       salary,
       CONCAT('Group ', NTILE(3) OVER (ORDER BY salary DESC)) AS bucket_label
FROM data_engineering_practice.sql.employees;


In [0]:
SELECT emp_name,
       salary,
       NTILE(3) OVER (ORDER BY salary DESC) AS bucket_num,
       CASE NTILE(3) OVER (ORDER BY salary DESC)
            WHEN 1 THEN 'High'
            WHEN 2 THEN 'Medium'
            WHEN 3 THEN 'Low'
       END AS bucket_label
FROM data_engineering_practice.sql.employees;


In [0]:
WITH labels AS (
    SELECT 1 AS bucket, 'High' AS label UNION ALL
    SELECT 2, 'Medium' UNION ALL
    SELECT 3, 'Low'
)
SELECT e.emp_name,
       e.salary,
       e.bucket_num,
       l.label AS bucket_label
FROM (
    SELECT emp_name, salary,
           NTILE(3) OVER (ORDER BY salary DESC) AS bucket_num
    FROM data_engineering_practice.sql.employees
) e
JOIN labels l
  ON e.bucket_num = l.bucket;


Handle Null Value - Not coaleas

In [0]:
CREATE or replace TABLE data_engineering_practice.sql.emp_ids (
  emp_id INT PRIMARY KEY,
  aadhaar VARCHAR(12)
);

INSERT INTO data_engineering_practice.sql.emp_ids VALUES
(1,NULL),
(2,NULL),
(3,NULL),
(4,'567856785678'),
(5,NULL),
(6,'999988887777'),
(7,NULL),
(8,NULL),
(9,NULL),
(10,NULL);


In [0]:
SELECT emp_id,
       aadhaar,
       CASE 
         -- if forward-fill is still null (happens at the very beginning),
         -- then use the next non-null value
         WHEN LAST_VALUE(aadhaar) IGNORE NULLS 
                OVER (ORDER BY emp_id
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) IS NULL
         THEN FIRST_VALUE(aadhaar) IGNORE NULLS 
                OVER (ORDER BY emp_id
                      ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
         
         -- otherwise, just use the forward-filled value
         ELSE LAST_VALUE(aadhaar) IGNORE NULLS 
                OVER (ORDER BY emp_id
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
       END AS aadhaar_filled
FROM data_engineering_practice.sql.emp_ids
ORDER BY emp_id;


1️⃣ The problem we’re solving

Normally, we want to “fill down” the Aadhaar values if they’re missing.

But if the first row(s) are NULL, forward-filling has nothing to copy yet… so it stays NULL.

That’s where our CASE comes in: it says “if you can’t fill from the past, then look into the future.”

2️⃣ The key logic (the CASE)
CASE 
  WHEN (forward_fill) IS NULL
  THEN (back_fill)
  ELSE (forward_fill)
END


Think of it as:
👉 “Check the forward-fill result. If it’s NULL, fall back to the back-fill. Otherwise, stick with the forward-fill.”

3️⃣ What’s “forward-fill”?

This line:

LAST_VALUE(aadhaar) IGNORE NULLS 
  OVER (ORDER BY emp_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)


LAST_VALUE → “Give me the most recent Aadhaar value so far in the order of emp_id.”

IGNORE NULLS → skip over blanks when looking back.

Window frame (ROWS BETWEEN ...) → “Start at the very first row and look through the current row.”

So, it’s like dragging the last non-null value downward.

4️⃣ What’s “back-fill”?

This line:

FIRST_VALUE(aadhaar) IGNORE NULLS 
  OVER (ORDER BY emp_id ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)


FIRST_VALUE → “Give me the first Aadhaar I see in this window.”

Window frame (from current row to the end) → looks ahead in the table.

With IGNORE NULLS, it finds the next non-null Aadhaar.

So, if forward-fill fails (like at the top), we borrow the next available non-null value.

5️⃣ Together in the CASE

If forward-fill is NULL → use back-fill (look ahead).

Otherwise → use forward-fill (look behind).

This ensures:

The first row(s) don’t stay empty if they start as NULL.

All middle rows get values carried forward.

Multiple consecutive nulls are handled smoothly.

✅ Quick mental image:

Forward-fill = dragging values downwards.

Back-fill = pulling values upwards.

CASE = chooses which one makes sense at that moment.