<a href="https://colab.research.google.com/github/ankitarm/SQL_Data_Engineer/blob/main/SQL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🔗[LeetCode 176 : Second Highest Salary](https://leetcode.com/problems/second-highest-salary/)


---
 🔹 Problem Statement

**Table: Employee**

| Column Name | Type |
|-------------|------|
| id          | int  |
| salary      | int  |

- `id` is the primary key.
- Each row in this table contains the salary of an employee.

---

 ✏️ Task

Write a SQL query to find the **second highest distinct salary** from the `Employee` table.  
If there is no second highest salary, return `null`.

---

 📥 Example 1

**Input:**

**Employee**

| id | salary |
|----|--------|
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |

**Output:**

| SecondHighestSalary |
|---------------------|
| 200                 |

---

 📥 Example 2

**Input:**

**Employee**

| id | salary |
|----|--------|
| 1  | 100    |

**Output:**

| SecondHighestSalary |
|---------------------|
| null                |

---

 ✅ SQL Solution

 - `OFFSET` to remove top n and `LIMIT` to keep n after OFFSET
 - Returns NULL if wntry doesnt exist.
```sql
SELECT (
    SELECT DISTINCT salary
    FROM Employee
    ORDER BY salary DESC
    LIMIT 1 OFFSET 1
) AS SecondHighestSalary;
```

 - Below query is `Incorrect` as Wont return NULL if entry doesn't exist.
 ```sql
SELECT DISTINCT salary AS SecondHighestSalary
FROM Employee
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
```

# 🔗[LeetCode 177 : Nth Highest Salary](https://leetcode.com/problems/nth-highest-salary/)

---

 🔹 Problem Statement

**Table: Employee**

| Column Name | Type |
| ----------- | ---- |
| id          | int  |
| salary      | int  |

* `id` is the primary key.
* Each row in this table contains the salary of an employee.

---

 ✏️ Task

Write a SQL query to find the **nth highest distinct salary** from the `Employee` table.
If there are less than `n` distinct salaries, return `null`.

---

 📥 Example 1

**Input:**

**Employee**

| id | salary |
| -- | ------ |
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |

n = 2

**Output:**

| getNthHighestSalary(2) |
| ---------------------- |
| 200                    |

---

 📥 Example 2

**Input:**

**Employee**

| id | salary |
| -- | ------ |
| 1  | 100    |

n = 2

**Output:**

| getNthHighestSalary(2) |
| ---------------------- |
| null                   |

---

 ✅ SQL Solution

* Uses `LIMIT` and `OFFSET` to skip the top (n-1) salaries and fetch the nth one
* `SELECT (...)` ensures it returns `null` when nth salary doesn’t exist

```sql
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
  SET N = N - 1;
  RETURN (
    SELECT DISTINCT salary
    FROM Employee
    ORDER BY salary DESC
    LIMIT 1 OFFSET N
  );
END
```

---

- We cannot use directly OFFSET N - !
- The output column name must be getNthHighestSalary(n) (which implies a function is being called).
```sql
SELECT DISTINCT salary
FROM Employee
ORDER BY salary DESC
LIMIT 1 OFFSET N - 1;```

- Use DENSE_RANK() Version
```sql
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
  RETURN (
    SELECT salary
    FROM (
      SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
      FROM Employee
    ) ranked
    WHERE rnk = N
    LIMIT 1
  );
END```


| Data Size        | Recommended Query                  | Notes                            |
| ---------------- | ---------------------------------- | -------------------------------- |
| Small (< 100 GB) | `LIMIT` + `OFFSET`                 | Fast and lightweight             |
| Large (> 100 GB) | `DENSE_RANK()` via CTE or subquery | Scalable, better for performance |




# 🔗[LeetCode 178 : Rank Scores](https://leetcode.com/problems/rank-scores/)



---

 🔹 Problem Statement

**Table: Scores**

| Column Name | Type    |
| ----------- | ------- |
| id          | int     |
| score       | decimal |

* `id` is the primary key.
* Each row contains a score from a game.
* `score` is a floating point number with two decimal places.

---

 ✏️ Task

Write a SQL query to rank scores from highest to lowest using the following rules:

* Scores are ranked in **descending order**.
* If two scores are the same, they receive the **same rank**.
* After a tie, the next rank should be the **next integer** (no gaps).

Return the result table **ordered by score descending**.

---

 📥 Example

**Input:**

**Scores**

| id | score |
| -- | ----- |
| 1  | 3.50  |
| 2  | 3.65  |
| 3  | 4.00  |
| 4  | 3.85  |
| 5  | 4.00  |

**Output:**

| score | rank |
| ----- | ---- |
| 4.00  | 1    |
| 4.00  | 1    |
| 3.85  | 2    |
| 3.65  | 3    |
| 3.50  | 4    |

---

 ✅ SQL Solution

* Uses `DENSE_RANK()` to handle ties with no gaps.
* Orders results by `score` descending.

```sql
SELECT score,
DENSE_RANK() OVER(ORDER BY score DESC) AS 'rank'
FROM Scores;
```

---



#🔗[LeetCode 180 : Consecutive Numbers](https://leetcode.com/problems/consecutive-numbers/)

---

 🔹 Problem Statement

**Table: Logs**

| Column Name | Type |
| ----------- | ---- |
| id          | int  |
| num         | int  |

* `id` is the unique identifier representing order.
* `num` is the number recorded.

---

 ✏️ Task

Find all numbers that appear **at least 3 times consecutively** in the `Logs` table.

---

 📥 Example

**Input:**

| id | num |
| -- | --- |
| 1  | 1   |
| 2  | 1   |
| 3  | 1   |
| 4  | 2   |
| 5  | 2   |
| 6  | 3   |

**Output:**

| ConsecutiveNums |
| --------------- |
| 1               |

---

 ✅ SQL Solutions

---

 Solution 1: Using Self-JOINs

```sql
SELECT DISTINCT L1.Num AS ConsecutiveNums
FROM Logs L1
JOIN Logs L2 ON L1.Id = L2.Id - 1
JOIN Logs L3 ON L1.Id = L3.Id - 2
WHERE L1.Num = L2.Num AND L2.Num = L3.Num;
```

---

 Solution 2: Using Window Functions (`LEAD` and `LAG`)

```sql
WITH L_L AS (
    SELECT num AS ConsecutiveNums,
           LEAD(num) OVER (ORDER BY id) AS Lead_val,
           LAG(num) OVER (ORDER BY id) AS Lag_val
    FROM Logs
)
SELECT DISTINCT ConsecutiveNums
FROM L_L
WHERE Lead_val = ConsecutiveNums AND Lag_val = ConsecutiveNums;
```

---

 ⚡ Performance & Optimization Notes

| Approach             | Best For                       | Why?                                                                                                                                 |
| -------------------- | ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------ |
| **Self-JOINs**       | Smaller datasets (< 100K rows) | Simple joins work well with smaller data, easy to understand.                                                                        |
| **Window Functions** | Larger datasets (100K+ rows)   | Window functions are optimized for sequential data processing and avoid multiple joins, better for big data and distributed systems. |

---

 🔑 Summary

* For **small datasets**, Self-JOINs are easy and fast enough.
* For **large datasets**, Window Functions (`LEAD`/`LAG`) scale better, reduce computation, and are preferred in modern SQL engines.



# 🔗[LeetCode 181 : Employees Earning More Than Their Managers](https://leetcode.com/problems/employees-earning-more-than-their-managers/)

---

 🔹 Problem Statement

**Table: Employee**

| Column Name | Type    |
| ----------- | ------- |
| id          | int     |
| name        | varchar |
| salary      | int     |
| managerId   | int     |

* `id` is the primary key.
* `managerId` is the id of the employee’s manager.
* Each row contains an employee’s information including their salary and manager.

---

 ✏️ Task

Write a SQL query to find the names of employees who earn more than their managers.

---

 📥 Example

**Input:**

**Employee**

| id | name  | salary | managerId |
| -- | ----- | ------ | --------- |
| 1  | Joe   | 70000  | 3         |
| 2  | Henry | 80000  | 4         |
| 3  | Sam   | 60000  | NULL      |
| 4  | Max   | 90000  | NULL      |
| 5  | Janet | 69000  | 3         |
| 6  | Randy | 85000  | 4         |

**Output:**

| name  |
| ----- |
| Joe   |
| Randy |

---

 ✅ SQL Solution

* Self-join `Employee` table to compare employee salaries with their manager’s salaries.

```sql
SELECT e.name
FROM Employee e
JOIN Employee m ON e.managerId = m.id
WHERE e.salary > m.salary;
```

---


# 🔗[LeetCode 182 : Duplicate Emails](https://leetcode.com/problems/duplicate-emails/)

---

 🔹 Problem Statement

**Table: Person**

| Column Name | Type    |
| ----------- | ------- |
| id          | int     |
| email       | varchar |

* `id` is the primary key.
* Each row contains the email of a person.

---

 ✏️ Task

Write a SQL query to find all **duplicate emails** in the `Person` table.
Return the emails that appear **more than once**.

---

 📥 Example

**Input:**

**Person**

| id | email                                       |
| -- | ------------------------------------------- |
| 1  | [john@example.com](mailto:john@example.com) |
| 2  | [bob@example.com](mailto:bob@example.com)   |
| 3  | [john@example.com](mailto:john@example.com) |

**Output:**

| Email                                       |
| ------------------------------------------- |
| [john@example.com](mailto:john@example.com) |

---

 ✅ SQL Solution

* Use `GROUP BY` and `HAVING` to find emails appearing more than once.

```sql
SELECT email
FROM Person
GROUP BY email
HAVING COUNT(email) > 1;
```

---

 ⚙️ Behavior on Different Dataset Sizes

| Dataset Size                | Behavior & Considerations                                                                                                                                                                                                                                  |
| --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **< 100 GB (Small/Medium)** | Query runs efficiently if there's an index on `email`. `GROUP BY` and aggregation execute quickly. Can be handled in-memory or with modest disk I/O.                                                                                                       |
| **> 100 GB (Large)**        | Requires distributed processing or optimized execution. Without indexes, grouping large datasets can be expensive (high I/O, memory usage). Use partitioning, indexing, or distributed SQL engines (e.g., Hive, Presto, Spark SQL) to improve performance. |

---



https://leetcode.com/problems/customers-who-never-order/