# Problem: Delete Duplicates Emails

#### Table: `Person`

| Column Name | Type    |
|-------------|---------|
| id          | int     |
| email       | varchar |

id is the primary key (column with unique values) for this table.  
Each row of this table contains an email. The emails will not contain uppercase letters.  
 
### Task
**Write a solution to delete all duplicate emails, keeping only one unique email with the smallest `id`.** 

**For SQL users, please note that you are supposed to write a `DELETE` statement and not a `SELECT` one.** 

**For Pandas users, please note that you are supposed to modify `Person` in place.**

**After running your script, the answer shown is the `Person` table. The driver will first compile and run your piece of code and then show the `Person` table. The final order of the `Person` table does not matter.** 

**The result format is in the following example.** 


### Example 1:

### Input: 

#### Person table:

| id | email            |
|----|------------------|
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |

### Output: 

| id | email            |
|----|------------------|
| 1  | john@example.com |
| 2  | bob@example.com  |

**Explanation**: john@example.com is repeated two times. We keep the row with the smallest Id = 1.

In [9]:
import sqlite3
import pandas as pd

In [10]:
# Connect to a database (or create it if it doesn't exist)
conn = sqlite3.connect('example.db')

In [11]:
# Create a cursor object
cursor = conn.cursor()

In [12]:
# Create a table
cursor.execute('Create table If Not Exists Person (Id int, Email varchar(255))')

<sqlite3.Cursor at 0x1e6af2a1f40>

In [31]:
# Delete all existing rows
cursor.execute('DELETE FROM Person')

# Create data
data_to_insert = [
    (1, 'john@example.com'),
    (2, 'bob@example.com'),
    (3, 'john@example.com'),
]

# Insert data
cursor.executemany('''
INSERT INTO Person (id, email)
VALUES (?, ?)
''', data_to_insert)

# Commit changes
conn.commit()

# Check Person table
df = pd.read_sql_query('select * from Person', conn)
df

Unnamed: 0,Id,Email
0,1,john@example.com
1,2,bob@example.com
2,3,john@example.com


In [30]:
# Query
query = '''
delete from Person
where id in (
    select p1.id
    from Person p1
    join Person p2 on p1.email = p2.email
    where p1.id > p2.id
)
'''

cursor.execute(query)

query = 'select * from Person'

# Query data
df = pd.read_sql_query(query, conn)

# Display the result as a query
df

Unnamed: 0,Id,Email
0,1,john@example.com
1,2,bob@example.com


In [33]:
# Query with ROWID (sqlite3)
query = '''
delete from Person
where rowid not in (
    select min(rowid)
    from Person
    group by email
)
'''

cursor.execute(query)

query = 'select * from Person'

# Query data
df = pd.read_sql_query(query, conn)

# Display the result as a query
df

Unnamed: 0,Id,Email
0,1,john@example.com
1,2,bob@example.com


In [7]:
# Close the connection
conn.close()