# Advanced Pandas 1

In [None]:
# Datasets given are not available in the ipynb file
import pandas as pd

# 183. Customers Who Never Order

**Difficulty**: Easy  
**Topics**: SQL, Pandas  

## Table Schemas

### Table: Customers

| Column Name | Type    |
|-------------|---------|
| id          | int     |
| name        | varchar |

- `id` is the **primary key** for this table.
- Each row represents the ID and name of a customer.

### Table: Orders

| Column Name | Type    |
|-------------|---------|
| id          | int     |
| customerId  | int     |

- `id` is the **primary key** for this table.
- `customerId` is a **foreign key** referencing the `id` column in the `Customers` table.
- Each row indicates an order's ID and the corresponding customer who placed the order.

---

## Problem Statement

Write a solution to find all customers who have **never placed an order**.  

- Return the result table in **any order**.

---

## Input

### Customers Table:

| id | name  |
|----|-------|
| 1  | Joe   |
| 2  | Henry |
| 3  | Sam   |
| 4  | Max   |

### Orders Table:

| id | customerId |
|----|------------|
| 1  | 3          |
| 2  | 1          |

---

## Output

### Expected Result:

| Customers |
|-----------|
| Henry     |
| Max       |

---

## Explanation:

1. **Customer 1 (Joe)**: Appears in the `Orders` table (`customerId = 1`).  
2. **Customer 2 (Henry)**: Does **not** appear in the `Orders` table.  
3. **Customer 3 (Sam)**: Appears in the `Orders` table (`customerId = 3`).  
4. **Customer 4 (Max)**: Does **not** appear in the `Orders` table.  
5. The customers **Henry** and **Max** are returned as they have **never placed an order**.


In [None]:
# import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    result = pd.merge(customers, orders, left_on='id', right_on='customerId', how='outer')
    result = result[result['customerId'].isna()]
    result = result[['name']].rename(columns={'name':'customers'})
    return result

# 1148. Article Views I

**Difficulty**: Easy  
**Topics**: SQL, Pandas  

## Table Schema: Views

| Column Name  | Type    |
|--------------|---------|
| article_id   | int     |
| author_id    | int     |
| viewer_id    | int     |
| view_date    | date    |

- There is **no primary key** for this table (it may have duplicate rows).  
- Each row indicates that a `viewer` viewed an `article` (written by an `author`) on a specific date.  
- **Note**: If `author_id` equals `viewer_id`, it means the author viewed their own article.

---

## Problem Statement

Write a solution to find all the authors (`author_id`) who have viewed at least one of their own articles.  

Return the result table sorted by `id` (author_id) in **ascending order**.

---

## Input

### Views Table Example:

| article_id | author_id | viewer_id | view_date  |
|------------|-----------|-----------|------------|
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |

---

## Output

### Expected Result:

| id   |
|------|
| 4    |
| 7    |

---

## Explanation:

1. **Author 3**: Did not view their own articles.  
2. **Author 7**: Viewed article `2`, where `viewer_id = author_id = 7`.  
3. **Author 4**: Viewed article `3`, where `viewer_id = author_id = 4`.  
4. The authors **4** and **7** satisfy the condition and are returned in ascending order.


In [None]:
# import pandas as pd

def article_views(views: pd.DataFrame) -> pd.DataFrame:
    
    results = views[['author_id', 'viewer_id']]
    
    matching_rows = results[results['author_id'] == results['viewer_id']]
   
    unique_ids = matching_rows[['author_id']].drop_duplicates().reset_index(drop=True)
   
    unique_ids.rename(columns={'author_id': 'id'}, inplace=True)
    # Sort by id in ascending order
    unique_ids = unique_ids.sort_values(by='id').reset_index(drop=True)
    
    return unique_ids

# 1873. Calculate Special Bonus

**Difficulty**: Easy  
**Topics**: SQL, Pandas  

## Table Schema: Employees

| Column Name  | Type    |
|--------------|---------|
| employee_id  | int     |
| name         | varchar |
| salary       | int     |

- `employee_id` is the primary key (column with unique values) for this table.  
- Each row of this table indicates the employee ID, employee name, and salary.

---

## Problem Statement

Write a solution to calculate the bonus of each employee. The bonus of an employee is:

1. **100% of their salary** if:
   - The `employee_id` is an **odd number**, **AND**
   - The employee's `name` does **not start** with the character **'M'**.
2. **0** otherwise.

Return the result table ordered by `employee_id`.

---

## Input

### Employees Table Example:

| employee_id | name    | salary |
|-------------|---------|--------|
| 2           | Meir    | 3000   |
| 3           | Michael | 3800   |
| 7           | Addilyn | 7400   |
| 8           | Juan    | 6100   |
| 9           | Kannon  | 7700   |

---

## Output

### Expected Result:

| employee_id | bonus |
|-------------|-------|
| 2           | 0     |
| 3           | 0     |
| 7           | 7400  |
| 8           | 0     |
| 9           | 7700  |

---

## Explanation:

1. The employees with IDs **2** and **8** get **0 bonus** because they have an **even `employee_id`**.
2. The employee with ID **3** gets **0 bonus** because their **name starts with 'M'**.
3. The employees with IDs **7** and **9** get a **100% bonus** as they meet both conditions.


In [None]:
'''
Define a lambda function that adds two numbers
add = lambda x, y: x + y

Use the lambda function
result = add(3, 5)

print(result)  
'''
# Output: 8

In [None]:
# import pandas as pd

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    employees['bonus'] = employees.apply(lambda row: row['salary'] if row['employee_id'] % 2 == 1 and not row['name'].startswith('M') else 0, axis=1)
    result = employees[['employee_id', 'bonus']].sort_values(by='employee_id')
    return result

# 1667. Fix Names in a Table

**Difficulty**: Easy  
**Topics**: SQL, Pandas  

## Table Schema: Users

| Column Name | Type    |
|-------------|---------|
| user_id     | int     |
| name        | varchar |

- `user_id` is the **primary key** for this table.  
- This table contains the `user_id` and the `name` of a user.  
- The `name` consists of only **lowercase** and **uppercase** characters.

---

## Problem Statement

Write a solution to **fix the names** such that:
- Only the **first character** is uppercase.
- The rest of the characters are lowercase.

Return the result table ordered by `user_id`.

---

## Input

### Users Table:

| user_id | name  |
|---------|-------|
| 1       | aLice |
| 2       | bOB   |

---

## Output

### Expected Result:

| user_id | name  |
|---------|-------|
| 1       | Alice |
| 2       | Bob   |

---

## Explanation:

1. For user `1`, the original name is `"aLice"`. After fixing, it becomes `"Alice"`.  
2. For user `2`, the original name is `"bOB"`. After fixing, it becomes `"Bob"`.  
3. The result is returned sorted by `user_id`.


In [None]:
# import pandas as pd

def fix_names(users: pd.DataFrame) -> pd.DataFrame:
    users['name'] = users['name'].str.capitalize()
    return users.sort_values(by='user_id')
    