# Handling Duplicates in Pandas
This notebook demonstrates how to handle duplicate rows in a DataFrame using Pandas.

## Importing Required Libraries
We start by importing the Pandas library.

In [None]:
import pandas as pd

## Creating a Sample DataFrame
The DataFrame contains duplicate rows based on the combination of `nome` and `sobrenome` columns.

In [None]:
df = pd.DataFrame({
    "nome": ["Ana", "Bruno", "Ana", "Carlos", "Bruno", "Diana", "Ana", "Carlos", "Bruno"],
    "sobrenome": ["Silva", "Souza", "Silva", "Oliveira", "Souza", "Costa", "Souza", "Oliveira", "Mendes"],
    "salario": [3000, 4500, 3000, 5000, 4700, 4700, 3000, 5000, 4500]
})
df

## Removing Duplicate Rows
We remove duplicate rows while keeping the first occurrence by default. You can also choose to keep the last occurrence using `keep="last"`.

In [None]:
# Remove duplicate rows and keep the first occurrence.
df.drop_duplicates()

## Removing Duplicates Based on Specific Columns
We sort the DataFrame by salary in descending order and remove duplicates based on the combination of `nome` and `sobrenome`, keeping the row with the highest salary.

In [None]:
# Sort by salary and remove duplicates based on 'nome' and 'sobrenome'.
df = (df.sort_values("salario", ascending=False)
        .drop_duplicates(keep="first", subset=["nome", "sobrenome"]))
df

## Final DataFrame
The resulting DataFrame contains unique rows based on `nome` and `sobrenome`, with the highest salary retained for duplicates.

In [None]:
df