In [0]:
%sql
CREATE TABLE customers (
    customer_id INT,
    customer_name VARCHAR(50),
    city VARCHAR(50),
    updated_at DATE
);

INSERT INTO customers VALUES
(1, 'Amit',  'Delhi',  '2024-01-01'),
(1, 'Amit',  'Mumbai', '2024-01-10'),
(2, 'Riya',  'Pune',   '2024-01-05'),
(2, 'Riya',  'Pune',   '2024-01-05'),
(3, 'John',  'Chennai','2024-01-03');


##1.Business Scenario

You receive customer data daily.
Due to upstream issues, duplicate customer records are created.

üëâ Requirement:
Keep only the latest record per customer based on updated_at.
Write a SQL query to return only the latest record for each customer.

üëâ Expected output:

| customer_id | customer_name | city    | updated_at |
| ----------- | ------------- | ------- | ---------- |
| 1           | Amit          | Mumbai  | 2024-01-10 |
| 2           | Riya          | Pune    | 2024-01-05 |
| 3           | John          | Chennai | 2024-01-03 |


In [0]:
%sql
select * from customers qualify row_number() over(partition by customer_id order by updated_at desc)=1

In [0]:
%sql
CREATE TABLE source_customers (
    customer_id INT
);

INSERT INTO source_customers VALUES
(1),(2),(3),(4),(5);
CREATE TABLE target_customers (
    customer_id INT
);

INSERT INTO target_customers VALUES
(1),(2),(4);


##2.Scenario2
Find customer IDs present in source_customers but missing in target_customers.

In [0]:
%sql
select * from source_customers s anti join target_customers t on s.customer_id=t.customer_id

In [0]:
%sql
CREATE TABLE employee_sales (
    emp_id INT,
    emp_name VARCHAR(50),
    dept VARCHAR(50),
    region VARCHAR(50),
    sale_date DATE,
    sales_amount INT
);
INSERT INTO employee_sales VALUES
(1,'Amit','IT','North','2024-01-01',1000),
(1,'Amit','IT','North','2024-01-10',2000),
(2,'Riya','IT','South','2024-01-05',1500),
(2,'Riya','IT','South','2024-01-20',3000),
(3,'John','HR','North','2024-01-03',1200),
(3,'John','HR','North','2024-01-15',1800),
(4,'Sara','HR','South','2024-01-08',2200),
(5,'Mike','Finance','West','2024-01-12',2500);


##Scenario3

Find total sales per employee.

üëâ Output: emp_id, emp_name, total_sales

In [0]:
%sql
select emp_id,emp_name, sum(sales_amount) as total_sales from employee_sales group by emp_id,emp_name

##4.Scenario 4
For each department, find the top 1 employee by total sales.

‚ö†Ô∏è Conditions:

First calculate total sales per employee

Then rank employees within each department

Handle ties correctly (don‚Äôt randomly drop tied employees)

üëâ Expected columns:

dept, emp_id, emp_name, total_sales

In [0]:
%sql
with temp_sales 
as(
  select emp_id,emp_name,dept, sum(sales_amount) as total_sales from employee_sales group by emp_id,emp_name,dept
)
select * from temp_sales qualify rank() over(partition by dept order by total_sales desc)=1


##5. Advanced Windowing + Time Logic


For each employee, show:

sale_date

sales_amount

previous sale amount

difference from previous sale

‚ö†Ô∏è Rules:

Compare only within the same employee

First sale should show NULL difference

üëâ Expected columns:

In [0]:
%sql
select sale_date,sales_amount, lag(sales_amount) over(partition by emp_id order by sale_date asc) as previous_sale_amount from employee_sales

‚≠ê Optional Enhancements
1Ô∏è‚É£ Provide default value instead of NULL
LAG(sales_amount, 1, 0) OVER (
    PARTITION BY emp_id
    ORDER BY sale_date
)

2Ô∏è‚É£ Add sale difference (very common interview ask)
%sql
SELECT
    emp_id,
    sale_date,
    sales_amount,
    sales_amount -
    LAG(sales_amount) OVER (
        PARTITION BY emp_id
        ORDER BY sale_date
    ) AS sale_diff
FROM employee_sales;

3Ô∏è‚É£ Descending order (latest vs previous)
LAG(sales_amount) OVER (
    PARTITION BY emp_id
    ORDER BY sale_date DESC
)


##6.
For each employee, show:

sale_date

sales_amount

previous sale amount

difference from previous sale

‚ö†Ô∏è Rules:

Compare only within the same employee

First sale should show NULL difference

üëâ Expected columns:

emp_id, sale_date, sales_amount, prev_sales, diff_from_prev

In [0]:
%sql
select emp_id,sale_date,sales_amount,lag(sales_amount) over(partition by emp_id order by sale_date asc) as prev_sales,(sales_amount-prev_sales) as diff_from_prev from employee_sales

##7.
Business wants to flag employees whose sales dropped compared to their previous sale.

üëâ Requirement:

Compare sale with previous sale per employee

Flag only rows where current sale < previous sale

üëâ Expected output:

emp_id, sale_date, sales_amount, prev_sales

In [0]:
%sql
select * from (select emp_id,sale_date,sales_amount,lag(sales_amount) over(partition by emp_id order by sale_date asc) as prev_sales from employee_sales) t where sales_amount < prev_sales

##8.
For each department, find the employee who contributed the highest percentage of total department sales.

üëâ Steps (what interviewer expects you to think):

Total sales per employee

Total sales per department

Calculate employee contribution %

Pick top contributor per department

üëâ Expected columns:

dept, emp_id, emp_name, contribution_pct

In [0]:
%sql

with total_sales 
as
(
  select emp_id,emp_name,dept,sum(sales_amount) as total_sale from employee_sales group by emp_id,emp_name,dept
),
temp_sales
as
(
  select *,sum(total_sale) over(partition by dept) as dept_sales,(total_sale/dept_sales)*100 as per_sales from total_sales
)
select dept,emp_id,emp_name,per_sales as contribution_pct from temp_sales qualify rank() over(partition by dept order by per_sales desc)=1


One Subtle but CRITICAL Detail (Very Important)
üö® Integer Division Bug

If total_sale and dept_sales are INT, this line can break silently:

(total_sale / dept_sales) * 100


üëâ Example:

2000 / 10000 = 0   ‚ùå (integer division)

‚úÖ Interview-Safe Fix (Say This!)
(total_sale * 100.0 / dept_sales) as per_sales


or

(cast(total_sale as decimal(10,2)) / dept_sales) * 100


üí° Saying this in interview = huge bonus points.

‚≠ê What to Say in Interview

‚ÄúI always force decimal division when calculating percentages to avoid silent truncation.‚Äù

That‚Äôs real production experience.

9.
You receive daily sales data in employee_sales.
You want to load only NEW records into a target table employee_sales_hist.

üëâ Condition:

Load records only if (emp_id, sale_date) does NOT already exist in target

üëâ Tables:

employee_sales ‚Üí source

employee_sales_hist ‚Üí target

‚úçÔ∏è Write the SQL to identify NEW records only.

with new_sales as
(select * from employee_sales e anti join employee_sales_hist h on e.emp_id=h.emp_id and e.sale_date=h.sale_date)
insert into employee_sales_hist select * from new_sales

#or

insert into employee_sales_hist
select *
from employee_sales e
where not exists (
    select 1
    from employee_sales_hist h
    where e.emp_id = h.emp_id
      and e.sale_date = h.sale_date
);


##9.
Employee department can change over time.
We must track history.

Target Table: employee_dim
emp_id
emp_name
dept
start_date
end_date
is_current

Requirement:

When dept changes:

Expire old record (end_date = current_date - 1, is_current = 'N')

Insert new record (start_date = current_date, end_date = '9999-12-31', is_current = 'Y')

If no change ‚Üí do nothing

Source Table (Daily Snapshot / CDC)

Let‚Äôs call it: employee_src

This represents latest state of employee data coming daily from HR.

employee_src
------------
emp_id
emp_name
dept
Take your time ‚Äî this is the last and hardest one üëë

MERGE INTO dept_table AS d
USING emp_table AS e
ON d.emp_id = e.emp_id
AND d.is_current = 'Y'
 
WHEN MATCHED AND d.dept != e.dept THEN
  UPDATE SET
    d.end_date = current_date - 1,
    d.is_current = 'N'
 
WHEN NOT MATCHED BY TARGET THEN
  INSERT (emp_id, emp_name, dept, start_date, end_date, is_current)
  VALUES (e.emp_id, e.emp_name, e.dept, current_date, '9999-12-31', 'Y')

‚úÖ Correct Production-Grade SCD Type-2 Pattern
üîπ Step 1: Expire changed records
MERGE INTO employee_dim d
USING employee_src e
ON d.emp_id = e.emp_id
AND d.is_current = 'Y'

WHEN MATCHED AND d.dept <> e.dept THEN
  UPDATE SET
    d.end_date = current_date - 1,
    d.is_current = 'N';

üîπ Step 2: Insert new & changed records
INSERT INTO employee_dim
(emp_id, emp_name, dept, start_date, end_date, is_current)
SELECT e.emp_id,
       e.emp_name,
       e.dept,
       current_date,
       '9999-12-31',
       'Y'
FROM employee_src e
LEFT JOIN employee_dim d
  ON e.emp_id = d.emp_id
 
WHERE d.emp_id IS NULL;


üëâ This guarantees:

One active row per employee

Full history preserved

No duplicates

‚≠ê What to Say in Interview (This is GOLD)

‚ÄúIn most databases, SCD Type-2 is implemented as a two-step process ‚Äî
first expire old records, then insert the new active version.‚Äù

That sentence alone signals real production experience.