# Advanced Pandas 5

In [None]:
import pandas as pd

# 2356. Number of Unique Subjects Taught by Each Teacher

**Difficulty**: Easy  
**Topics**: SQL, Pandas  

## Schema

### Table: `Teacher`
| Column Name | Type |
|-------------|------|
| `teacher_id`  | int  |
| `subject_id`  | int  |
| `dept_id`     | int  |

- `(subject_id, dept_id)` is the primary key (unique combinations of columns) of this table.
- Each row in this table indicates that the teacher with `teacher_id` teaches the subject `subject_id` in the department `dept_id`.

---

## Task

Write a solution to calculate the number of unique subjects each teacher teaches in the university.  

Return the result table in **any order**.

---

## Example

### Input:  
`Teacher` table:
| teacher_id | subject_id | dept_id |
|------------|------------|---------|
| 1          | 2          | 3       |
| 1          | 2          | 4       |
| 1          | 3          | 3       |
| 2          | 1          | 1       |
| 2          | 2          | 1       |
| 2          | 3          | 1       |
| 2          | 4          | 1       |

---

### Output:
| teacher_id | cnt |
|------------|-----|
| 1          | 2   |
| 2          | 4   |

---

### Explanation:
#### Teacher 1:
- Teaches **subject 2** in departments **3** and **4**.
- Teaches **subject 3** in department **3**.  
Unique subjects taught: **2**.

#### Teacher 2:
- Teaches **subject 1** in department **1**.
- Teaches **subject 2** in department **1**.
- Teaches **subject 3** in department **1**.
- Teaches **subject 4** in department **1**.  
Unique subjects taught: **4**.


In [None]:
def count_unique_subjects(teacher: pd.DataFrame) -> pd.DataFrame:
    unique_subjects = teacher.drop_duplicates(['teacher_id', 'subject_id'])
    result = unique_subjects.groupby('teacher_id')['subject_id'].count().reset_index()
    result.columns = ['teacher_id', 'cnt']
    return result

In [None]:
# another solution 250 ms
def count_unique_subjects(teacher: pd.DataFrame) -> pd.DataFrame:
    return teacher.groupby('teacher_id', as_index=False)['subject_id'].nunique().rename(columns={'subject_id':'cnt'})

# 596. Classes More Than 5 Students

**Difficulty:** Easy  
**Topics:** SQL, Pandas

---

### Table: Courses

| Column Name | Type    |
|-------------|---------|
| student     | varchar |
| class       | varchar |

- *(student, class)* is the primary key (combination of columns with unique values) for this table.
- Each row of this table indicates the name of a student and the class in which they are enrolled.

---

### Problem

Write a solution to find all the classes that have at least five students.

Return the result table in any order.

The result format is in the following example:

---

### Example 1:

**Input:**  
Courses table:

| student | class    |
|---------|----------|
| A       | Math     |
| B       | English  |
| C       | Math     |
| D       | Biology  |
| E       | Math     |
| F       | Computer |
| G       | Math     |
| H       | Math     |
| I       | Math     |

**Output:**  

| class   |
|---------|
| Math    |

**Explanation:**  
- Math has 6 students, so we include it.
- English has 1 student, so we do not include it.
- Biology has 1 student, so we do not include it.
- Computer has 1 student, so we do not include it.


In [None]:
# my solution 380 ms
def find_classes(courses: pd.DataFrame) -> pd.DataFrame:
    class_counts = courses.groupby('class').size().reset_index(name='student_count')
    filtered_classes = class_counts[class_counts['student_count'] >= 5]
    return filtered_classes[['class']]


In [None]:
# better solution 227 
def find_classes(courses: pd.DataFrame) -> pd.DataFrame:
    
    courses_count=courses.groupby('class')['student'].count().reset_index()
    return courses_count[courses_count['student']>=5][['class']]

# 586. Customer Placing the Largest Number of Orders

**Difficulty:** Easy

**Topics:** SQL, Pandas

## Table: Orders

| Column Name     | Type     |
|------------------|----------|
| order_number     | int      |
| customer_number  | int      |

- `order_number` is the primary key (column with unique values) for this table.
- This table contains information about the order ID and the customer ID.

---

## Problem Statement

Write a solution to find the `customer_number` for the customer who has placed the largest number of orders.

The test cases are generated so that **exactly one customer** will have placed more orders than any other customer.

The result format is in the following example.

---

### Example 1:

#### Input:
Orders table:

| order_number | customer_number |
|--------------|-----------------|
| 1            | 1               |
| 2            | 2               |
| 3            | 3               |
| 4            | 3               |

#### Output: 
| customer_number |
|-----------------|
| 3               |

#### Explanation:
The customer with number `3` has two orders, which is greater than either customer `1` or `2` because each of them only has one order. So the result is `customer_number 3`.

---

### Follow-up:
What if more than one customer has the largest number of orders? Can you find all the `customer_number` in this case?


In [None]:
# solution with 404 ms
def largest_orders(orders: pd.DataFrame) -> pd.DataFrame:
    order_counts = orders.groupby("customer_number").size().reset_index(name="order_count")
    max_orders = order_counts["order_count"].max()
    top_customer = order_counts[order_counts["order_count"] == max_orders]
    return top_customer[['customer_number']]


In [None]:
# solution with 245 ms
def largest_orders(orders: pd.DataFrame) -> pd.DataFrame:
    return orders['customer_number'].mode().to_frame()

### mode() in Pandas
The **mode()** function in Pandas returns the most frequently occurring value(s) in a Series or DataFrame. If there are multiple modes (values with the same highest frequency), it returns all of them.

Key Points:
- For Series: Returns a Series containing the mode(s).
- For DataFrame: Returns a DataFrame containing the mode(s) for each column.
- If no mode exists (e.g., all values are unique), it returns the values with equal frequency.

### to_frame() in Pandas
The **to_frame()** function is used to convert a Pandas Series into a DataFrame.

Key Points:
- The resulting DataFrame has one column, where the original Series values become the column's data.
- You can optionally specify a name for the resulting column.

In [None]:
# mode() and to_frame() are used to get the most frequent value in the column and convert it to a DataFrame, respectively.

### 1484. Group Sold Products By The Date
**Difficulty:** Easy  
**Topics:** SQL, Pandas  

---

#### **Table Activities**

| Column Name | Type    |
|-------------|---------|
| sell_date   | date    |
| product     | varchar |

- There is no primary key (column with unique values) for this table. It may contain duplicates.
- Each row of this table contains the product name and the date it was sold in a market.

---

### **Problem Statement**

Write a solution to find for each date the number of different products sold and their names.

- The sold product names for each date should be sorted lexicographically.
- Return the result table ordered by `sell_date`.

---

### **Example**

#### **Input:**  
**Activities table:**

| sell_date  | product     |
|------------|-------------|
| 2020-05-30 | Headphone   |
| 2020-06-01 | Pencil      |
| 2020-06-02 | Mask        |
| 2020-05-30 | Basketball  |
| 2020-06-01 | Bible       |
| 2020-06-02 | Mask        |
| 2020-05-30 | T-Shirt     |

---

#### **Output:**  

| sell_date  | num_sold | products                     |
|------------|----------|------------------------------|
| 2020-05-30 | 3        | Basketball,Headphone,T-Shirt |
| 2020-06-01 | 2        | Bible,Pencil                 |
| 2020-06-02 | 1        | Mask                         |

---

#### **Explanation:**

- For `2020-05-30`: Sold items were (`Headphone`, `Basketball`, `T-Shirt`).  
  Sorting them lexicographically gives `Basketball,Headphone,T-Shirt`.  

- For `2020-06-01`: Sold items were (`Pencil`, `Bible`).  
  Sorting them lexicographically gives `Bible,Pencil`.  

- For `2020-06-02`: The sold item is `Mask`.  
  Since there is only one product, we just return it.  


In [None]:
# 360 ms solution
def categorize_products(activities: pd.DataFrame) -> pd.DataFrame:
    grouped = (activities.groupby('sell_date')['product'].apply(lambda x: sorted(set(x))).reset_index())
    grouped['num_sold'] = grouped['product'].apply(len)  
    grouped['products'] = grouped['product'].apply(lambda x: ",".join(x))
    return grouped.drop(columns=['product']).sort_values(by='sell_date')

In [None]:
# 250 ms solution
def categorize_products(activities: pd.DataFrame) -> pd.DataFrame:
    prod_list = activities.groupby(['sell_date'],as_index=False)['product'].agg(['nunique', lambda x: ','.join(sorted(set(x)))])
    prod_list.columns = ['sell_date','num_sold','products']
    return prod_list
    