# Advanced Pandas 2

In [None]:
import pandas as pd

# 1517. Find Users With Valid E-Mails

**Difficulty**: Easy  
**Topics**: SQL, Pandas  

---

## Table Schema: Users

| Column Name | Type    |
|-------------|---------|
| user_id     | int     |
| name        | varchar |
| mail        | varchar |

- `user_id` is the **primary key** (column with unique values) for this table.  
- This table contains information about users signed up on a website, including their emails, some of which may be invalid.

---

## Problem Statement

Write a solution to find the users who have **valid emails**.  
A valid email satisfies the following conditions:
1. **Prefix name**:
   - May contain **letters** (upper or lower case), **digits**, **underscore `_`**, **period `.`**, and/or **dash `-`**.
   - Must **start with a letter**.
2. **Domain**:
   - Must be `@leetcode.com`.

---

## Return:
- The result table should include the `user_id`, `name`, and `mail` of users with valid emails.
- The result can be returned in **any order**.

---

## Input

### Users Table:

| user_id | name      | mail                    |
|---------|-----------|-------------------------|
| 1       | Winston   | winston@leetcode.com    |
| 2       | Jonathan  | jonathanisgreat         |
| 3       | Annabelle | bella-@leetcode.com     |
| 4       | Sally     | sally.come@leetcode.com |
| 5       | Marwan    | quarz#2020@leetcode.com |
| 6       | David     | david69@gmail.com       |
| 7       | Shapiro   | .shapo@leetcode.com     |

---

## Output

### Expected Result:

| user_id | name      | mail                    |
|---------|-----------|-------------------------|
| 1       | Winston   | winston@leetcode.com    |
| 3       | Annabelle | bella-@leetcode.com     |
| 4       | Sally     | sally.come@leetcode.com |

---

## Explanation:

1. **User 1**:
   - Email `winston@leetcode.com` is valid.

2. **User 2**:
   - Email `jonathanisgreat` does not have a domain. Invalid.

3. **User 3**:
   - Email `bella-@leetcode.com` is valid.

4. **User 4**:
   - Email `sally.come@leetcode.com` is valid.

5. **User 5**:
   - Email `quarz#2020@leetcode.com` contains the `#` symbol, which is not allowed. Invalid.

6. **User 6**:
   - Email `david69@gmail.com` does not belong to the `@leetcode.com` domain. Invalid.

7. **User 7**:
   - Email `.shapo@leetcode.com` starts with a period (`.`). Invalid.


In [None]:
# import pandas as pd
import re

def valid_emails(users: pd.DataFrame) -> pd.DataFrame:
    email_regex = r'^[a-zA-Z][a-zA-Z0-9._-]*@leetcode\.com$'
    users = users[users['mail'].str.match(email_regex, na=False)]
    return users[['user_id','name','mail']]

In [None]:
# more efficient solution 265ms
'''
Python
import pandas as pd

def valid_emails(users: pd.DataFrame) -> pd.DataFrame:
    valid_emails_df = users[users['mail'].str.match(r'^[A-Za-z][A-Za-z0-9_\.|-]*@leetcode(\?com)?\.com$')]
    return valid_emails_df
'''


### 1527. Patients With a Condition
**Difficulty**: Easy  
**Topics**: SQL, Pandas  

---

#### Table: Patients

| Column Name   | Type    |
|---------------|---------|
| patient_id    | int     |
| patient_name  | varchar |
| conditions    | varchar |

- `patient_id` is the primary key (a column with unique values) for this table.  
- `conditions` contains 0 or more codes separated by spaces.  
- This table contains information about the patients in the hospital.

---

### Problem
Write a solution to find the `patient_id`, `patient_name`, and `conditions` of the patients who have Type I Diabetes. Type I Diabetes always starts with the `DIAB1` prefix.

Return the result table in any order.

---

### Example 1

#### Input:  
**Patients table**:
| patient_id | patient_name | conditions   |
|------------|--------------|--------------|
| 1          | Daniel       | YFEV COUGH   |
| 2          | Alice        |              |
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 |
| 5          | Alain        | DIAB201      |

#### Output:  
| patient_id | patient_name | conditions   |
|------------|--------------|--------------|
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 |

#### Explanation:  
Bob and George both have a condition that starts with `DIAB1`.  


In [None]:
# import pandas as pd

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
    return patients[patients['conditions'].str.contains(r'\bDIAB1',regex=True)]

In [None]:
# solution from Vitalii
# import pandas as pd

def find_DIAB1(conditions: list[str]) -> bool:
    return any(condition.startswith('DIAB1') for condition in conditions)

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
    return patients[
        list(
            map(
                find_DIAB1,
                patients.conditions.str.split()
            )
        )
    ]

In [None]:
# General solution
# import pandas as pd

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
    return patients[patients.conditions.str.contains('(^DIAB1.*)|(.* DIAB1.*)', regex= True, na=False)]

### 177. Nth Highest Salary
**Difficulty**: Medium  
**Topics**: SQL, Pandas  

---

### Table: Employee

| Column Name | Type |
|-------------|------|
| id          | int  |
| salary      | int  |

- `id` is the primary key (a column with unique values) for this table.
- Each row of this table contains information about the salary of an employee.

---

### Problem
Write a solution to find the **nth highest salary** from the `Employee` table. If there is no **nth highest salary**, return `null`.

The result format is in the following example.

---

### Example 1

#### Input:  
**Employee table**:
| id | salary |
|----|--------|
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |

**n** = 2

#### Output:  
| getNthHighestSalary(2) |
|------------------------|
| 200                    |

---

### Example 2

#### Input:  
**Employee table**:
| id | salary |
|----|--------|
| 1  | 100    |

**n** = 2

#### Output:  
| getNthHighestSalary(2) |
|------------------------|
| null                   |



In [None]:
# 681 ms
# struggled with this one
def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
    newsalary = employee.sort_values(by = "salary", ascending=False).drop_duplicates(subset="salary", keep='first')
    if  employee["salary"].nunique() < N or N<=0:
        salary=pd.DataFrame({f'getNthHighestSalary({N})':[pd.NA]})
    else:
        p = newsalary.iloc[N-1]["salary"]
        salary=pd.DataFrame({f'getNthHighestSalary({N})':[p]})
    return salary   

In [None]:
# 288 ms
def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
    # 1. Extract the "salary" column from the employee DataFrame
    #    This selects the "salary" column, which contains the salary information of employees.
    salary = employee["salary"].drop_duplicates().nlargest(N)
    # Explanation of drop_duplicates():
    # drop_duplicates() is used to remove any duplicate salary values in the "salary" column.
    # Explanation of nlargest(N):
    # nlargest(N) returns the N largest unique salary values from the column.

    # 2. Check if the list of unique salaries has less than N values, or if N is invalid (<= 0)
    #    If either of these conditions is true, the function will return None, since there aren't enough unique salaries to return the N-th highest.
    if len(salary) < N or N <= 0:
        # If there are fewer than N unique salaries or N is less than or equal to 0, set val to None.
        val = None
    else:
        # 3. Retrieve the N-th highest salary
        #    The tail(1) function selects the last element in the series, which is the N-th highest salary.
        val = salary.tail(1).values[0]
    
    # 4. Return the result as a DataFrame with a column named "getNthHighestSalary(N)"
    #    The result will be a DataFrame containing the N-th highest salary (or None if not found).
    return pd.DataFrame({f"getNthHighestSalary({N})": [val]})

