# Advanced Pandas 6

In [None]:
import pandas as pd

# 1693. Daily Leads and Partners  

**Difficulty**: Easy  
**Topics**: SQL, Pandas  

## Schema  

### Table: `DailySales`  
| Column Name | Type    |  
|-------------|---------|  
| `date_id`   | date    |  
| `make_name` | varchar |  
| `lead_id`   | int     |  
| `partner_id`| int     |  

- There is no primary key (column with unique values) for this table.  
- The table may contain duplicates.  
- This table contains the date and the name of the product sold and the IDs of the lead and partner it was sold to.  
- The `make_name` consists of only lowercase English letters.  

---

## Task  

Write a solution to calculate the number of distinct `lead_id`'s and `partner_id`'s for each `date_id` and `make_name`.  

Return the result table in **any order**.

---

## Example  

### Input:  
`DailySales` table:  
| date_id    | make_name | lead_id | partner_id |  
|------------|-----------|---------|------------|  
| 2020-12-8  | toyota    | 0       | 1          |  
| 2020-12-8  | toyota    | 1       | 0          |  
| 2020-12-8  | toyota    | 1       | 2          |  
| 2020-12-7  | toyota    | 0       | 2          |  
| 2020-12-7  | toyota    | 0       | 1          |  
| 2020-12-8  | honda     | 1       | 2          |  
| 2020-12-8  | honda     | 2       | 1          |  
| 2020-12-7  | honda     | 0       | 1          |  
| 2020-12-7  | honda     | 1       | 2          |  
| 2020-12-7  | honda     | 2       | 1          |  

---

### Output:  
| date_id    | make_name | unique_leads | unique_partners |  
|------------|-----------|--------------|-----------------|  
| 2020-12-8  | toyota    | 2            | 3               |  
| 2020-12-7  | toyota    | 1            | 2               |  
| 2020-12-8  | honda     | 2            | 2               |  
| 2020-12-7  | honda     | 3            | 2               |  

---

### Explanation:  

#### Date: 2020-12-8  
- **Toyota**  
  - Leads: `[0, 1]`  
  - Partners: `[0, 1, 2]`  
  - Unique leads: **2**, Unique partners: **3**  
- **Honda**  
  - Leads: `[1, 2]`  
  - Partners: `[1, 2]`  
  - Unique leads: **2**, Unique partners: **2**  

#### Date: 2020-12-7  
- **Toyota**  
  - Leads: `[0]`  
  - Partners: `[1, 2]`  
  - Unique leads: **1**, Unique partners: **2**  
- **Honda**  
  - Leads: `[0, 1, 2]`  
  - Partners: `[1, 2]`  
  - Unique leads: **3**, Unique partners: **2**  


In [None]:
# 462 ms
def daily_leads_and_partners(daily_sales: pd.DataFrame) -> pd.DataFrame:
    sales = daily_sales.groupby(['date_id','make_name']).agg(unique_leads=('lead_id', 'nunique'), unique_partners=('partner_id','nunique')).reset_index()
    return sales


In [None]:
# solution 2 with  305ms
def daily_leads_and_partners(daily_sales: pd.DataFrame) -> pd.DataFrame:
    return daily_sales.groupby(
        ['date_id', 'make_name']
    ).nunique().reset_index().rename(columns={
        'lead_id': 'unique_leads',
        'partner_id': 'unique_partners',
    })

# 1050. Actors and Directors Who Cooperated At Least Three Times

**Difficulty:** Easy  
**Topics:** SQL, Pandas  

---

### Table: ActorDirector

| Column Name | Type    |
|-------------|---------|
| actor_id    | int     |
| director_id | int     |
| timestamp   | int     |

- *timestamp* is the primary key (column with unique values) for this table.

---

### Problem

Write a solution to find all the pairs `(actor_id, director_id)` where the actor has cooperated with the director at least three times.

Return the result table in any order.

The result format is in the following example.

---

### Example 1:

**Input:**  
ActorDirector table:

| actor_id    | director_id | timestamp   |
|-------------|-------------|-------------|
| 1           | 1           | 0           |
| 1           | 1           | 1           |
| 1           | 1           | 2           |
| 1           | 2           | 3           |
| 1           | 2           | 4           |
| 2           | 1           | 5           |
| 2           | 1           | 6           |

**Output:**  

| actor_id    | director_id |
|-------------|-------------|
| 1           | 1           |

**Explanation:**  
The only pair is `(1, 1)` where they cooperated exactly 3 times.


In [None]:
# 263 ms
def actors_and_directors(actor_director: pd.DataFrame) -> pd.DataFrame:
    result = (
        actor_director.groupby(['actor_id', 'director_id'])
        .filter(lambda x: len(x) >= 3)
        .drop_duplicates(subset=['actor_id', 'director_id'])  
    )
    return result[['actor_id', 'director_id']]



In [None]:
# 243 ms
def actors_and_directors(actor_director: pd.DataFrame) -> pd.DataFrame:
    df=actor_director.groupby(["actor_id","director_id"]).size().reset_index(name="count")
    return df[df["count"]>=3][["actor_id","director_id"]]

### 1378. Replace Employee ID With The Unique Identifier  
**Difficulty:** Easy  
**Topics:** SQL, Pandas  

---

#### **Table: Employees**  

| Column Name | Type    |  
|-------------|---------|  
| id          | int     |  
| name        | varchar |  

- `id` is the primary key (column with unique values) for this table.  
- Each row of this table contains the `id` and the `name` of an employee in a company.  

---

#### **Table: EmployeeUNI**  

| Column Name | Type |  
|-------------|------|  
| id          | int  |  
| unique_id   | int  |  

- `(id, unique_id)` is the primary key (combination of columns with unique values) for this table.  
- Each row of this table contains the `id` and the corresponding `unique_id` of an employee in the company.  

---

### **Problem Statement**  

Write a solution to show the `unique_id` of each user. If a user does not have a `unique_id`, show `null` instead.  

- Return the result table in any order.  

---

### **Example**  

#### **Input:**  

**Employees table:**  

| id | name     |  
|----|----------|  
| 1  | Alice    |  
| 7  | Bob      |  
| 11 | Meir     |  
| 90 | Winston  |  
| 3  | Jonathan |  

**EmployeeUNI table:**  

| id | unique_id |  
|----|-----------|  
| 3  | 1         |  
| 11 | 2         |  
| 90 | 3         |  

---

#### **Output:**  

| unique_id | name     |  
|-----------|----------|  
| null      | Alice    |  
| null      | Bob      |  
| 2         | Meir     |  
| 3         | Winston  |  
| 1         | Jonathan |  

---

### **Explanation:**  

- Alice and Bob do not have a `unique_id`, so we show `null` instead.  
- The `unique_id` of Meir is `2`.  
- The `unique_id` of Winston is `3`.  
- The `unique_id` of Jonathan is `1`.  


In [None]:
# 907 ms
def replace_employee_id(employees: pd.DataFrame, employee_uni: pd.DataFrame) -> pd.DataFrame:
    result = pd.merge(employees, employee_uni, how="left")
    result = result[['unique_id', 'name']]
    result['unique_id'] = result['unique_id'].fillna(np.nan)
    return result

In [None]:
# 283 ms
def replace_employee_id(employees: pd.DataFrame, employee_uni: pd.DataFrame) -> pd.DataFrame:
    return pd.merge(employees,employee_uni,on='id',how='left')[['unique_id','name']]  

# 1280. Students and Examinations

**Difficulty**: Easy  
**Topics**: SQL, Data Manipulation  
**Schema**:  
**Pandas Schema**  

---

## **Table: Students**

| Column Name   | Type    |  
|---------------|---------|  
| student_id    | int     |  
| student_name  | varchar |  
student_id is the primary key (column with unique values) for this table.  
Each row of this table contains the ID and the name of one student in the school.  

---

## **Table: Subjects**

| Column Name  | Type    |  
|--------------|---------|  
| subject_name | varchar |  
subject_name is the primary key (column with unique values) for this table.  
Each row of this table contains the name of one subject in the school.  

---

## **Table: Examinations**

| Column Name  | Type    |  
|--------------|---------|  
| student_id   | int     |  
| subject_name | varchar |  
There is no primary key (column with unique values) for this table. It may contain duplicates.  
Each student from the Students table takes every course from the Subjects table.  
Each row of this table indicates that a student with ID `student_id` attended the exam of `subject_name`.  

---

## **Problem Statement**

Write a solution to find the number of times each student attended each exam.

---

### **Return Format**
Return the result table ordered by `student_id` and `subject_name`.

---

### **Example 1**

#### **Input**:
**Students table**:
| student_id | student_name |  
|------------|--------------|  
| 1          | Alice        |  
| 2          | Bob          |  
| 13         | John         |  
| 6          | Alex         |  

**Subjects table**:
| subject_name |  
|--------------|  
| Math         |  
| Physics      |  
| Programming  |  

**Examinations table**:
| student_id | subject_name |  
|------------|--------------|  
| 1          | Math         |  
| 1          | Physics      |  
| 1          | Programming  |  
| 2          | Programming  |  
| 1          | Physics      |  
| 1          | Math         |  
| 13         | Math         |  
| 13         | Programming  |  
| 13         | Physics      |  
| 2          | Math         |  
| 1          | Math         |  

---

#### **Output**:
| student_id | student_name | subject_name | attended_exams |  
|------------|--------------|--------------|----------------|  
| 1          | Alice        | Math         | 3              |  
| 1          | Alice        | Physics      | 2              |  
| 1          | Alice        | Programming  | 1              |  
| 2          | Bob          | Math         | 1              |  
| 2          | Bob          | Physics      | 0              |  
| 2          | Bob          | Programming  | 1              |  
| 6          | Alex         | Math         | 0              |  
| 6          | Alex         | Physics      | 0              |  
| 6          | Alex         | Programming  | 0              |  
| 13         | John         | Math         | 1              |  
| 13         | John         | Physics      | 1              |  
| 13         | John         | Programming  | 1              |  

---

#### **Explanation**:
- The result table should contain all students and all subjects.  
- Alice attended the Math exam 3 times, the Physics exam 2 times, and the Programming exam 1 time.  
- Bob attended the Math exam 1 time, the Programming exam 1 time, and did not attend the Physics exam.  
- Alex did not attend any exams.  
- John attended the Math exam 1 time, the Physics exam 1 time, and the Programming exam 1 time.


In [None]:
# 514 ms
def students_and_examinations(students: pd.DataFrame, subjects: pd.DataFrame, examinations: pd.DataFrame) -> pd.DataFrame:
    safe = students.merge(subjects, how='cross')
    exam_counts = examinations.groupby(['student_id', 'subject_name']).size().reset_index(name='attended_exams')
    result = safe.merge(exam_counts, on=['student_id', 'subject_name'], how='left')
    result['attended_exams'] = result['attended_exams'].fillna(0).astype(int)
    result = result.sort_values(by=['student_id', 'subject_name']).reset_index(drop=True)
    return result

In [None]:
# 310 ms
def students_and_examinations(students: pd.DataFrame, subjects: pd.DataFrame, examinations: pd.DataFrame) -> pd.DataFrame:

    # we are going to need a multindex, which is the cartesian product of students and subjects
    multi_index = pd.MultiIndex.from_product([students['student_id'].unique(),subjects['subject_name']], names=['student_id','subject_name'])

    # get the grouped counts, 
    # then reindex to add zero counts
    # reset index to get a regular df and rename 
    attended_exams = examinations.value_counts(subset=['student_id','subject_name']).reindex(multi_index, fill_value=0).reset_index(name='attended_exams')

    # merging here to add the student_name
    # next we sort on student_id first and then subject_name
    # and then reorder the columns
    attended_exams = pd.merge(attended_exams, students, on='student_id').sort_values(['student_id','subject_name'], ascending=True)[['student_id','student_name','subject_name','attended_exams']]


    return attended_exams