## Window Function

### Why we need it?

Window functions are incredibly useful tools in SQL and data analysis, as they allow you to perform calculations across a set of rows related to the current row, without grouping the data. Here's why you might need window functions, illustrated with examples:

1. **Calculations with Context:**
   Window functions provide context-aware calculations for each row. They enable you to calculate values based on a window of rows defined by a partition and an order, without collapsing the result set.
   Example: Calculating the running total of sales for each day, while still showing individual sales records.

2. **Aggregate Functions without Grouping:**
   Window functions allow you to compute aggregate values without using the `GROUP BY` clause. This is helpful when you want to maintain the granularity of the individual rows in your result.
   Example: Calculating the average salary across departments for each employee without grouping by department.

3. **Ranking and Percentiles:**
   Window functions enable ranking and percentile calculations, which help you understand where a particular value stands within a set of data.
   Example: Finding the top 5 students with the highest marks in each department.

4. **Moving Averages and Trends:**
   Window functions can be used to calculate moving averages, which reveal trends in data over a specific window or period.
   Example: Computing a 7-day moving average of stock prices to identify short-term trends.

5. **Top-N and Bottom-N:**
   Window functions allow you to identify the top or bottom values within a partition without grouping the data.
   Example: Finding the top 3 customers by revenue for each product category.

6. **Lag and Lead Analysis:**
   Window functions can be used to access data from previous or subsequent rows, enabling time-based or sequential analysis.
   Example: Analyzing the change in stock prices from the previous day to identify patterns.

7. **Complex Ranking Logic:**
   Window functions provide flexibility in ranking logic, allowing you to handle ties and gaps more effectively than traditional ranking methods.
   Example: Assigning unique ranks to students based on their scores and handling tied scores gracefully.

8. **Comparative Analysis:**
   Window functions enable easy comparison between different rows, helping you identify outliers, trends, and patterns.
   Example: Comparing an employee's current salary with the average salary in their department.

9. **Efficiency and Performance:**
   Window functions can be more efficient than self-joins or subqueries when performing calculations that involve related rows.
   Example: Calculating the running total or cumulative sum without resorting to complex self-join queries.

10. **Simplifying Complex Queries:**
    Window functions can often simplify complex queries, making them more readable and maintainable.
    Example: Consolidating a multi-step process into a single query for calculating year-to-date sales for each product.

In essence, window functions empower you to perform intricate calculations and analysis on data while maintaining the granularity of individual rows. This avoids the need for repetitive subqueries or self-joins, leading to more efficient and comprehensible queries.

In the context of calculating the average marks of students in each department, you can use a window function to efficiently compute the average while grouping the results by department. Let's say you have a table named "StudentMarks" with the following columns:

- StudentID
- Department
- Marks

You can use a window function to calculate the average marks for each department. Here's a basic example using SQL:

```sql
SELECT
    Department,
    AVG(Marks) OVER (PARTITION BY Department) AS AvgMarks
FROM
    StudentMarks;
```

In this example, the `AVG` function is used as a window function with the `OVER` clause. The `PARTITION BY` clause divides the result set into partitions based on the "Department" column, and the `AVG(Marks)` function calculates the average marks within each department's partition. The result will include a column showing the average marks for each student's department.

Let's say you have the following data in your "StudentMarks" table:

| StudentID | Department | Marks |
|-----------|------------|-------|
| 1         | Math       | 85    |
| 2         | Science    | 76    |
| 3         | Math       | 92    |
| 4         | Science    | 81    |
| 5         | Math       | 78    |

The query would produce the following result:

| Department | AvgMarks |
|------------|----------|
| Math       | 85      |
| Science    | 78.5    |
| Math       | 85      |
| Science    | 78.5    |
| Math       | 85      |

Note that the window function doesn't group the rows in the same way as a regular `GROUP BY` clause. Instead, it performs calculations for each row in the result set, considering the defined partition. If you want to avoid duplicate rows, you can use the `DISTINCT` keyword in the `SELECT` statement.

Remember that the syntax and capabilities of window functions may vary depending on the database management system you're using. This example demonstrates a basic use case to calculate the average marks using a window function.

## Find the highest and lowest marks using window function

Assuming you have the same "StudentMarks" table with columns:

- StudentID
- Department
- Marks

You can use the `MAX` and `MIN` window functions along with the `OVER` clause to find the highest and lowest marks for each department:

```sql
SELECT
    StudentID,
    Department,
    Marks,
    MAX(Marks) OVER (PARTITION BY Department) AS HighestMark,
    MIN(Marks) OVER (PARTITION BY Department) AS LowestMark
FROM
    StudentMarks;
```

Let's use the following sample data:

| StudentID | Department | Marks |
|-----------|------------|-------|
| 1         | Math       | 85    |
| 2         | Science    | 76    |
| 3         | Math       | 92    |
| 4         | Science    | 81    |
| 5         | Math       | 78    |

The query would produce the following result:

| StudentID | Department | Marks | HighestMark | LowestMark |
|-----------|------------|-------|-------------|------------|
| 1         | Math       | 85    | 92          | 78         |
| 2         | Science    | 76    | 81          | 76         |
| 3         | Math       | 92    | 92          | 78         |
| 4         | Science    | 81    | 81          | 76         |
| 5         | Math       | 78    | 92          | 78         |

In this example, the `MAX` and `MIN` window functions are used with the `OVER` clause partitioned by the "Department" column. This calculates the highest and lowest marks within each department's partition for each row.

Just like in the previous example, remember that the syntax and functionalities of window functions might differ based on the specific database management system you are using.

### Write sql to find all the students who have marks higher than the avg marks in their respective branch using window function

To find all the students who have marks higher than the average marks in their respective branch using a window function, you can use the following SQL query:

```sql
SELECT
    StudentID,
    Department,
    Marks,
    AVG(Marks) OVER (PARTITION BY Department) AS AvgMarksInDepartment
FROM
    StudentMarks
WHERE
    Marks > AVG(Marks) OVER (PARTITION BY Department);
```

In this query, the `AVG` window function calculates the average marks within each department's partition, and then the main query selects rows where the student's marks are higher than the average marks in their respective department.

Assuming you have the "StudentMarks" table with the columns "StudentID," "Department," and "Marks," the query will return all rows where a student's marks are above the average marks in their department.

## Rank Function

The `RANK()` window function is used to assign a rank to each row in a result set based on a specified ordering. It's like giving each row a position in a competition where the ranks represent how well a row performs compared to others.

Imagine you have a table of students and their scores in a quiz. You want to assign ranks to these students based on their scores, where a higher score gets a better rank. Let's take a look at how you would achieve this using the `RANK()` window function.

Assume you have a table named "QuizScores" with the following columns:

- StudentName
- Score

Here's an example query:

```sql
SELECT
    StudentName,
    Score,
    RANK() OVER (ORDER BY Score DESC) AS Ranking
FROM
    QuizScores;
```

In this query:

- `StudentName` is the name of the student.
- `Score` is the score they achieved in the quiz.
- `RANK() OVER (ORDER BY Score DESC)` calculates the rank of each student based on their score, with the highest score getting the best (lowest) rank. The `DESC` keyword specifies that the ordering should be in descending order.

Let's say the "QuizScores" table has the following data:

| StudentName | Score |
|-------------|-------|
| Alice       | 85    |
| Bob         | 92    |
| Carol       | 78    |
| David       | 92    |
| Emily       | 76    |

The query would produce the following result:

| StudentName | Score | Ranking |
|-------------|-------|---------|
| Bob         | 92    | 1       |
| David       | 92    | 1       |
| Alice       | 85    | 3       |
| Carol       | 78    | 4       |
| Emily       | 76    | 5       |

In this example, you can see that Bob and David both have the highest score (92) and are tied for the first rank. The next student, Alice, has the third-highest score, so she gets the third rank, and so on.

The `RANK()` window function is helpful when you want to assign ranks to rows based on a certain criterion and understand their relative positions in a sorted list. It's particularly useful in scenarios like competition rankings, leaderboards, and ranking students based on their performance.

## Dense Rank Function

The `DENSE_RANK()` window function is similar to the `RANK()` function, but it doesn't leave gaps in the ranking sequence when there are ties. It assigns consecutive rank values to tied rows. Let's continue with the same example of ranking students based on their quiz scores:

Assuming you have the same "QuizScores" table with the columns "StudentName" and "Score," here's how you can use the `DENSE_RANK()` function:

```sql
SELECT
    StudentName,
    Score,
    DENSE_RANK() OVER (ORDER BY Score DESC) AS DenseRank
FROM
    QuizScores;
```

Using the same data:

| StudentName | Score |
|-------------|-------|
| Alice       | 85    |
| Bob         | 92    |
| Carol       | 78    |
| David       | 92    |
| Emily       | 76    |

The query would produce the following result:

| StudentName | Score | DenseRank |
|-------------|-------|-----------|
| Bob         | 92    | 1         |
| David       | 92    | 1         |
| Alice       | 85    | 2         |
| Carol       | 78    | 3         |
| Emily       | 76    | 4         |

In this example, you can see that both Bob and David have the highest score of 92 and are tied for the first dense rank. The next student, Alice, gets the second dense rank, and so on. Unlike the regular `RANK()` function, there are no gaps in the dense rank sequence for tied rows. Each rank is assigned consecutively.

The `DENSE_RANK()` window function is particularly useful when you want to assign ranks in a way that accounts for ties and doesn't skip rank values for tied rows. It's commonly used in scenarios where you want to create leaderboards or ranking lists where tied participants receive the same rank.

## ROW_NUMBER Function

The `ROW_NUMBER()` window function is used to assign a unique sequential integer to each row within a result set. Unlike `RANK()` or `DENSE_RANK()`, it doesn't consider ties; each row receives a distinct number in order of appearance in the result set. Let's use the same example of ranking students based on their quiz scores:

Assuming you have the same "QuizScores" table with the columns "StudentName" and "Score," here's how you can use the `ROW_NUMBER()` function:

```sql
SELECT
    StudentName,
    Score,
    ROW_NUMBER() OVER (ORDER BY Score DESC) AS RowNumber
FROM
    QuizScores;
```

Using the same data:

| StudentName | Score |
|-------------|-------|
| Alice       | 85    |
| Bob         | 92    |
| Carol       | 78    |
| David       | 92    |
| Emily       | 76    |

The query would produce the following result:

| StudentName | Score | RowNumber |
|-------------|-------|-----------|
| Bob         | 92    | 1         |
| David       | 92    | 2         |
| Alice       | 85    | 3         |
| Carol       | 78    | 4         |
| Emily       | 76    | 5         |

In this example, the `ROW_NUMBER()` function assigns a unique sequential number to each row based on the ordering of scores in descending order. Each row receives a distinct number without considering ties; that's why Bob and David have different row numbers despite having the same score.

The `ROW_NUMBER()` window function is useful when you want to provide a simple sequential identifier to each row in the result set, regardless of any specific ranking criteria. It's commonly used when you need a stable identifier for each row, such as when paginating results or selecting a specific number of top records.