## COMMON TABLE EXPRESSIONS (CTEs) in PostgreSQL

- A Common Table Expression (CTE) is a temporary result set that you can reference within a `SELECT, INSERT, UPDATE,` or `DELETE` statement. It improves query readability and modularity, especially for complex queries.



In [None]:
-- Basic CTE

WITH avg_salary AS (
    SELECT AVG(salary) AS avg_salary
    FROM employees
)
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT avg_salary FROM avg_salary);
-- find employees with salary greater than the average salary.

In [None]:
-- Multiple CTEs

-- find employees in departments with total salary greater than 100000.

WITH department_salary AS (
    SELECT department_id, SUM(salary) AS total_salary
    FROM employees
    GROUP BY department_id
),
high_salary_departments AS (
    SELECT department_id
    FROM department_salary
    WHERE total_salary > 100000
)
SELECT e.employee_name, e.department_id
FROM employees e
JOIN high_salary_departments hsd ON e.department_id = hsd.department_id;



In [None]:
-- Recursive CTE

WITH RECURSIVE employee_hierarchy AS (
    SELECT employee_id, manager_id, employee_name
    FROM employees
    WHERE manager_id IS NULL  -- Start with top-level manager
    UNION ALL
    SELECT e.employee_id, e.manager_id, e.employee_name
    FROM employees e
    INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy;


-- find all employees reporting to a specific manager(hierarchical data).

## TABLE PARTITIONING

Definition:
- Table partitioning is a database optimization technique where a large table is divided into 
smaller, more manageable pieces (partitions) based on a specific column's value (e.g., date, 
range, or list). Each partition is stored and queried separately, improving performance for 
large datasets.




Use Cases:

- Log Management: Partition logs by date to improve query performance for recent logs.

- Archiving: Keep older data in separate partitions for archival purposes.
- Performance: Speeds up queries by scanning only relevant partitions instead of the entire table.

#### 1. Partition by Range
- Divides the table into partitions based on a range of values in a column.

- Use Case: Time-series data, such as logs or events.


Example:

In [None]:
-- Create a partitioned table for logs
CREATE TABLE logs (
  log_id SERIAL,
  log_time TIMESTAMP NOT NULL,
  message TEXT
) PARTITION BY RANGE (log_time);

-- Create partitions for specific date ranges
CREATE TABLE logs_2024 PARTITION OF logs FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
CREATE TABLE logs_2025 PARTITION OF logs FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-------

#### 2. Partition by List
- Divides the table into partitions based on a list of discrete values.

- Use Case: Categorized data, such as regions or product categories.
Example:


In [None]:
CREATE TABLE sales (
  sale_id SERIAL,
  region TEXT NOT NULL,
  amount NUMERIC
) PARTITION BY LIST (region);

CREATE TABLE sales_north PARTITION OF sales FOR VALUES IN ('North');
CREATE TABLE sales_south PARTITION OF sales FOR VALUES IN ('South');

----------

#### 3. Partition by Hash
- Divides the table into partitions based on a hash function applied to a column's value.

- Use Case: Distributing data evenly across partitions for load balancing.

Example:



In [None]:
-- didnt understood

CREATE TABLE users (
  user_id SERIAL,
  username TEXT NOT NULL
) PARTITION BY HASH (user_id);

CREATE TABLE users_part_1 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE users_part_2 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 1);

--------

#### 4. Partition by Range with Subpartitioning
- Combines range partitioning with another method (e.g., hash or list) for more granular control.

- Use Case: Multi-level partitioning, such as time-based logs further divided by region.

In [None]:
CREATE TABLE logs (
  log_id SERIAL,
  log_time TIMESTAMP NOT NULL,
  region TEXT
) PARTITION BY RANGE (log_time);

CREATE TABLE logs_2024 PARTITION OF logs FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
  PARTITION BY LIST (region);

CREATE TABLE logs_2024_north PARTITION OF logs_2024 FOR VALUES IN ('North');
CREATE TABLE logs_2024_south PARTITION OF logs_2024 FOR VALUES IN ('South');


---------------

#### 5. Partition by Default
- A default partition is used to store rows that do not match any other partition.

- Use Case: Handling unexpected or uncategorized data.

Example:

In [None]:
CREATE TABLE sales (
  sale_id SERIAL,
  region TEXT NOT NULL,
  amount NUMERIC
) PARTITION BY LIST (region);


CREATE TABLE sales_north PARTITION OF sales FOR VALUES IN ('North');
CREATE TABLE sales_default PARTITION OF sales DEFAULT;



-- CREATE TABLE sales_default PARTITION OF sales DEFAULT;
-- CREATE TABLE sales_north PARTITION OF sales FOR VALUES IN ('North');

-- this order also does same since postgres will check the default partition after checking all other partitions.

## WINDOW FUNCTIONS 

Window functions operate on a set of rows related to the current row, providing powerful tools for ranking, comparisons, and aggregations.

#### 1. ROW_NUMBER
- Definition:
Assigns a unique sequential number to each row within a partition, starting from 1.

- Use Case:
    - Paginating results (e.g., displaying 10 rows per page).
    - Identifying the first or last row in a group.
    
Example:



In [None]:
-- Assign a unique row number to employees ordered by salary
SELECT employee_name, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;


-- Assign a unique row number to employees within each department ordered by salary
SELECT employee_name, salary, ROW_NUMBER() OVER (PARTITION by department_name ORDER BY salary DESC) AS row_num
FROM employees;

Best Practices:
- Use ROW_NUMBER when you need unique numbering, even if there are ties.
- Combine with PARTITION BY to reset numbering for each group.

------------------

#### 2. RANK
- Definition:
Assigns a rank to each row within a partition. Rows with the same value receive the same rank, but the next rank is skipped.

- Use Case:
    - Leaderboards where ties are allowed, and gaps in ranking are acceptable.

Example:



In [None]:
-- Rank employees by salary
SELECT employee_name, salary, RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;


-- we can use partition by clause with rank function to rank employees within each department.

Best Practices:

- Use RANK when ties are acceptable, and gaps in ranking are meaningful.
- Combine with PARTITION BY to rank within specific groups (e.g., departments).

---------------

#### 3. DENSE_RANK
- Definition:
Similar to RANK, but does not skip ranks when there are ties.

- Use Case:
    - Rankings where consecutive numbering is required, even with ties.
    
Example:

In [None]:
-- Dense rank employees by salary
SELECT employee_name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

-- Assign a unique row number to employees ordered by salary, but reset the row number for each department.

-- we can use partition by clause with dense rank function to rank employees within each department.


Best Practices:

- Use DENSE_RANK when you want consecutive ranks without gaps.
- Ideal for scenarios like product rankings or performance scores.
--------------

#### 4. LEAD
- Definition:
Accesses data from the next row in the result set.

Use Case:

- Comparing a row with the next row (e.g., finding the difference between consecutive rows).
- Identifying trends or changes in data.
Example:

In [None]:
-- Compare each employee's salary with the next employee's salary
SELECT employee_name, salary, LEAD(salary) OVER (ORDER BY salary DESC) AS next_salary
FROM employees;


-- other parameters of lead function are: 

Best Practices:

- Use `LEAD` for forward-looking comparisons.
- Combine with `PARTITION BY` for group-specific comparisons.

--------------

#### 5. LAG
- Definition:
Accesses data from the previous row in the result set.

- Use Case:

    - Comparing a row with the previous row (e.g., calculating differences or changes).
    - Analyzing historical trends.
    
Example:

In [None]:
-- Compare each employee's salary with the previous employee's salary
SELECT employee_name, salary, LAG(salary) OVER (ORDER BY salary DESC) AS prev_salary
FROM employees;

- Best Practices:

    - Use LAG for backward-looking comparisons.
    - Combine with PARTITION BY for group-specific comparisons.


####  Best Practices for Window Functions
- Use PARTITION BY:
    - Divide data into logical groups for more meaningful results.
    - Example: Rank employees within each department.

- Optimize with Indexes:
    - Create indexes on columns used in ORDER BY to improve performance.

- Avoid Overuse:
- Window functions can be computationally expensive. Use them judiciously in large datasets.

- Combine with CTEs:
    - Use Common Table Expressions (CTEs) to simplify complex queries involving window functions.

- Use NULL Handling:
    - Handle NULL values explicitly in LEAD and LAG to avoid unexpected results.