# Advanced Pandas 3

In [None]:
import pandas as pd

### Problem: 177. Nth Highest Salary

#### Difficulty: Medium  
#### Topics: SQL, Pandas  

---

### Table Schema

#### Employee Table:

| Column Name | Type  |
|-------------|-------|
| id          | int   |
| salary      | int   |

- `id` is the primary key (column with unique values) for this table.
- Each row of this table contains information about the salary of an employee.

---

### Task

Write a solution to find the nth highest salary from the `Employee` table. If there is no nth highest salary, return `null`.

---

### Example 1

#### Input:
**Employee Table**:
| id | salary |
|----|--------|
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |

`n = 2`

#### Output:
| getNthHighestSalary(2) |
|-------------------------|
| 200                     |

---

### Example 2

#### Input:
**Employee Table**:
| id | salary |
|----|--------|
| 1  | 100    |

`n = 2`

#### Output:
| getNthHighestSalary(2) |
|-------------------------|
| null                   |

---

### Constraints
1. The input contains at least one salary record.
2. `N` is a positive integer.


In [None]:
# schema
data = [[1, 100], [2, 200], [3, 300]]
employee = pd.DataFrame(data, columns=['id', 'salary']).astype({'id':'int64', 'salary':'int64'})

In [None]:
# import pandas as pd

def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
    newsalary = employee.sort_values(by = "salary", ascending=False).drop_duplicates(subset="salary", keep='first')
    if  employee["salary"].nunique() < N or N<=0:
        salary=pd.DataFrame({f'getNthHighestSalary({N})':[pd.NA]})
    else:
        p = newsalary.iloc[N-1]["salary"]
        salary=pd.DataFrame({f'getNthHighestSalary({N})':[p]})
    return salary    

### Problem: 176. Second Highest Salary

#### Difficulty: Medium  
#### Topics: SQL, Pandas  

---

### Table Schema

#### Employee Table:

| Column Name | Type  |
|-------------|-------|
| id          | int   |
| salary      | int   |

- `id` is the primary key (column with unique values) for this table.
- Each row of this table contains information about the salary of an employee.

---

### Task

Write a solution to find the second highest distinct salary from the `Employee` table. If there is no second highest salary, return `null` (or `None` in Pandas).

---

### Example 1

#### Input:
**Employee Table**:
| id | salary |
|----|--------|
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |

#### Output:
| SecondHighestSalary |
|---------------------|
| 200                 |

---

### Example 2

#### Input:
**Employee Table**:
| id | salary |
|----|--------|
| 1  | 100    |

#### Output:
| SecondHighestSalary |
|---------------------|
| null                |

---

### Constraints
1. The input contains at least one salary record.
2. The salary column will not contain negative values.
3. `null` or `None` should be returned if there is no second highest salary.


In [None]:
# import pandas as pd

def second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame:
    newsalary = employee.sort_values(by="salary", ascending=False).drop_duplicates(subset="salary", keep="first")
    if employee["salary"].nunique() < 2:
        salary = pd.DataFrame({"SecondHighestSalary": [None]})
    else:
        second_salary = newsalary.iloc[1]["salary"]
        salary = pd.DataFrame({"SecondHighestSalary": [second_salary]})
    return salary


### Problem: 184. Department Highest Salary

**Difficulty:** Medium  
**Topics:** SQL, Pandas  
**Companies:** Various  

---

#### Schema

**Table:** `Employee`

| Column Name  | Type    |
|--------------|---------|
| id           | int     |
| name         | varchar |
| salary       | int     |
| departmentId | int     |

- `id` is the primary key (a column with unique values) for this table.
- `departmentId` is a foreign key referencing the `id` column in the `Department` table.
- Each row represents an employee's ID, name, salary, and department ID.

---

**Table:** `Department`

| Column Name | Type    |
|-------------|---------|
| id          | int     |
| name        | varchar |

- `id` is the primary key (a column with unique values) for this table.
- Each row represents a department's ID and name (guaranteed to be non-NULL).

---

### Task

Write a solution to find employees who have the **highest salary** in each department.

---

### Result Format

The result must be formatted as follows:

| Column Name  | Type    |
|--------------|---------|
| Department   | varchar |
| Employee     | varchar |
| Salary       | int     |

Return the result in any order.

---

### Example

#### Example 1:

**Input:**  

**`Employee` Table:**

| id | name  | salary | departmentId |
|----|-------|--------|--------------|
| 1  | Joe   | 70000  | 1            |
| 2  | Jim   | 90000  | 1            |
| 3  | Henry | 80000  | 2            |
| 4  | Sam   | 60000  | 2            |
| 5  | Max   | 90000  | 1            |

**`Department` Table:**

| id | name  |
|----|-------|
| 1  | IT    |
| 2  | Sales |

**Output:**  

| Department | Employee | Salary |
|------------|----------|--------|
| IT         | Jim      | 90000  |
| Sales      | Henry    | 80000  |
| IT         | Max      | 90000  |

---

### Explanation:

- In the IT department, **Jim** and **Max** both have the highest salary of **90000**.
- In the Sales department, **Henry** has the highest salary of **80000**. 

Return all employees with the highest salary per department in the specified format. Order of the results does not matter.

In [None]:
# import pandas as pd

def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
 
    merged = pd.merge(employee, department, left_on='departmentId', right_on='id')
    merged = merged.drop(columns=['id_y']) 
    merged.rename(columns={'name_x': 'Employee', 'name_y': 'Department'}, inplace=True)
    max_salary_by_department = merged.groupby('Department')['salary'].max().reset_index()
    result = pd.merge(merged, max_salary_by_department, on=['Department', 'salary'])
    result = result[['Department', 'Employee', 'salary']]
    
    return result


In [None]:
# another solution of the problem
def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
    x = pd.merge(employee, employee.groupby(by=["departmentId"])[["salary"]].max(), left_on=["departmentId", "salary"], right_on=["departmentId", "salary"])
    return pd.merge(x, department, left_on="departmentId", right_on="id")[["name_y", "name_x", "salary"]].rename(columns={"name_y": "Department", "name_x": "Employee", "salary": "Salary"})
    

### 178. Rank Scores

**Difficulty**: Medium  
**Topics**: SQL, Pandas  
**Schema**:  

#### Table: `Scores`

| Column Name | Type    |
|-------------|---------|
| id          | int     |
| score       | decimal |

- `id` is the primary key (a column with unique values) for this table.
- Each row of this table contains the score of a game. The `score` is a floating-point value with two decimal places.

---

### Problem Statement

Write a solution to find the **rank** of the scores in the `Scores` table. The ranking should be calculated according to the following rules:

1. The scores should be ranked **from the highest to the lowest**.
2. If two scores are tied, they should share the **same rank**.
3. After a tie, the next ranking number should be the **next consecutive integer value**. There should be no gaps in the ranks.

The result table should be **ordered by score in descending order**.

---

### Output

The result table should have the following format:

| Column Name | Type |
|-------------|------|
| score       | decimal |
| rank        | int    |

---

### Example

#### Input:
**Scores Table**:

| id | score |
|----|-------|
| 1  | 3.50  |
| 2  | 3.65  |
| 3  | 4.00  |
| 4  | 3.85  |
| 5  | 4.00  |
| 6  | 3.65  |

---

#### Output:

| score | rank |
|-------|------|
| 4.00  | 1    |
| 4.00  | 1    |
| 3.85  | 2    |
| 3.65  | 3    |
| 3.65  | 3    |
| 3.50  | 4    |

---

### Explanation
- The score `4.00` appears twice and ranks **1**.
- The score `3.85` ranks **2**.
- The score `3.65` appears twice and ranks **3**.
- The score `3.50` ranks **4**. 

The output table is ordered by `score` in descending order.

In [None]:
#import pandas as pd

def order_scores(scores: pd.DataFrame) -> pd.DataFrame:
    rank_scores = scores.sort_values(by='score', ascending=False)
    rank_scores['rank'] = rank_scores['score'].rank(method='dense', ascending=False)
    return rank_scores[['score', 'rank']]

### 196. Delete Duplicate Emails

**Difficulty**: Easy  
**Topics**: SQL, Pandas  
**Schema**:  

#### Table: `Person`

| Column Name | Type    |
|-------------|---------|
| id          | int     |
| email       | varchar |

- `id` is the primary key (a column with unique values) for this table.
- Each row of this table contains an email. The emails will not contain uppercase letters.

---

### Problem Statement

Write a solution to delete all duplicate emails from the `Person` table, keeping only one unique email with the **smallest `id`**.

- For **SQL users**, write a `DELETE` statement (not a `SELECT` statement).
- For **Pandas users**, modify the `Person` table **in place**.

After running your script, the final output will be the modified `Person` table.

---

### Output Format

The `Person` table should only include unique emails, with the row corresponding to the **smallest `id`** for each duplicate email.  
The final order of the `Person` table **does not matter**.

---

### Example

#### Input:  
**Person Table**:

| id  | email              |
|-----|--------------------|
| 1   | john@example.com   |
| 2   | bob@example.com    |
| 3   | john@example.com   |

---

#### Output:  
**Person Table**:

| id  | email              |
|-----|--------------------|
| 1   | john@example.com   |
| 2   | bob@example.com    |

---

### Explanation:
- `john@example.com` appears twice in the table. The row with the smallest `id` (`id = 1`) is kept, and the other (`id = 3`) is removed.  
- `bob@example.com` appears only once, so it remains unchanged.