# 196. Delete Duplicate Emails

### Difficulty
**Easy**

---

## Problem Statement

Given a `Person` table, write a **SQL query** to **delete all duplicate emails**, keeping only **one unique email with the smallest `id`**.

- **For SQL users**: Write a `DELETE` statement, **not** a `SELECT` statement.
- **For Pandas users**: Modify the `Person` table **in place**.

**Note**:  
- The final order of the `Person` table **does not matter**.

---

## Table Schema

### **Table: Person**
| Column Name | Type    |
|-------------|---------|
| `id`        | `int`   |
| `email`     | `varchar` |

- `id` is the **primary key** (column with unique values) for this table.
- Each row contains an **email address**.
- The **emails do not contain uppercase letters**.

---

## Example

### **Input**
#### **Person table:**
| id  | email             |
|-----|------------------|
| 1   | john@example.com |
| 2   | bob@example.com  |
| 3   | john@example.com |

---

### **Output**
| id  | email             |
|-----|------------------|
| 1   | john@example.com |
| 2   | bob@example.com  |

---

### **Explanation**
- The email `"john@example.com"` appears **twice** in the `Person` table.
- We **keep only the row with the smallest `id`** (`id = 1`) and **delete the duplicate** (`id = 3`).
- `"bob@example.com"` is unique, so it remains.

---

## **Constraints**
- The `email` column is **not NULL**.
- The `id` column contains unique values.

---


# Solution

In [1]:
import pandas as pd

In [2]:
def delete_duplicate_emails(person: pd.DataFrame) -> None:
    min_ids = person.groupby('email')['id'].min()  # Get smallest 'id' for each email
    person.drop(person[~person['id'].isin(min_ids)].index, inplace=True)  # Drop duplicates in-place

## Time & Space Complexity
| **Operation** | **Time Complexity** | **Space Complexity** | **Why?** |
|--------------|-------------------|-------------------|---------|
| **`groupby('email')['id'].min()`** | **O(n)** | **O(u)** | Groups `n` rows into `u` unique emails |
| **`isin(min_ids)` filtering** | **O(n)** | **O(n)** | Checks `n` rows against `u` unique `id`s |
| **`drop(inplace=True)`** | **O(n)** | **O(1)** | Deletes rows without creating a new DataFrame |
| **Total Complexity** | **O(n)** | **O(n) (worst case)** |

---

# Alternative Solution:

In most real-world situation, creating a copy of the data frame is preferred:

In [3]:
import pandas as pd

def delete_duplicate_emails(person: pd.DataFrame) -> pd.DataFrame:
    return person[person['id'].isin(person.groupby('email')['id'].min())].copy()

### Why This is Better for Real-World Use:

- Does not modify the original DataFrame, avoiding unexpected side effects.
- Caller must explicitly reassign the DataFrame to store changes.

### **Time & Space Complexity: In-Place vs. Copying Approach**

| Approach | **Time Complexity** | **Space Complexity** | **Best For?** |
|----------|-------------------|-----------------|--------------|
| **In-Place Modification** | **O(n)** | **O(n) (worst case)** | Large datasets where memory efficiency matters |
| **Returning a Copy** | **O(n)** | **O(n) (always)** | Real-world scenarios where avoiding side effects is important |

**If working with large datasets, modifying in-place (`inplace=True`) is preferred.**  
**If working in a functional programming style, returning a copy is better.**  