# üßπ Limpeza e Pr√©-processamento de Dados - Pok√©mon Dataset

Este notebook √© dedicado √† **limpeza e prepara√ß√£o** do dataset bruto de Pok√©mon para an√°lises subsequentes.

## Objetivos

1. **Carregar e inspecionar** o dataset original (`Pokemon Database.csv`)
2. **Identificar e tratar** valores nulos, duplicados e inconsist√™ncias
3. **Padronizar nomes** de colunas para facilitar an√°lises
4. **Remover aspas extras** e caracteres indesejados
5. **Gerar dataset limpo** (`pokemon_dataset_cleaned.csv`) pronto para modelagem

## Metodologia de Limpeza

- **Inspe√ß√£o inicial**: Dimens√µes, tipos de dados, valores nulos
- **Limpeza de strings**: Remo√ß√£o de caracteres especiais
- **Tratamento de nulos**: An√°lise e decis√£o sobre manter/remover/imputar
- **Padroniza√ß√£o**: Nomes de colunas consistentes
- **Valida√ß√£o**: Verifica√ß√£o de qualidade dos dados limpos

---

## 0. Importa√ß√£o de Bibliotecas

Carregamento das bibliotecas necess√°rias para manipula√ß√£o e an√°lise de dados.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Configura√ß√µes globais
warnings.filterwarnings('ignore')
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

print("Bibliotecas carregadas com sucesso!")

Bibliotecas carregadas com sucesso!


---

## 1. Carregamento do Dataset Original

Carregamos o dataset bruto diretamente do arquivo CSV para iniciar a inspe√ß√£o e limpeza.

In [66]:
# Carregar dataset bruto
df = pd.read_csv("Pokemon Database.csv")

print("‚úì Dataset carregado com sucesso!")

‚úì Dataset carregado com sucesso!


---

## 2. Inspe√ß√£o Inicial do Dataset

An√°lise explorat√≥ria inicial para entender a estrutura, tipos de dados e qualidade do dataset.

### 2.1. Vis√£o Geral dos Dados

**Primeiras linhas do dataset**

In [4]:
df.head()

Unnamed: 0,Pokemon Id,Pokedex Number,Pokemon Name,Classification,Alternate Form Name,Original Pokemon ID,Legendary Type,Pokemon Height,Pokemon Weight,Primary Type,...,Speed EV,EV Yield Total,Catch Rate,Experience Growth,Experience Growth Total,Primary Egg Group,Secondary Egg Group,Egg Cycle Count,Pre-Evolution Pokemon Id,Evolution Details
0,1,1,"""Bulbasaur""","""Seed Pok√©mon""",,,,0.7,6.9,"""Grass""",...,0,1,45,"""Medium Slow""",1059860,"""Monster""","""Grass""",20,,
1,2,2,"""Ivysaur""","""Seed Pok√©mon""",,,,1.0,13.0,"""Grass""",...,0,2,45,"""Medium Slow""",1059860,"""Monster""","""Grass""",20,1.0,"""Level 16"""
2,3,3,"""Venusaur""","""Seed Pok√©mon""",,,,2.0,100.0,"""Grass""",...,0,3,45,"""Medium Slow""",1059860,"""Monster""","""Grass""",20,2.0,"""Level 32"""
3,4,3,"""Venusaur""","""Seed Pok√©mon""","""Mega""",3.0,,2.4,155.5,"""Grass""",...,0,3,45,"""Medium Slow""",1059860,"""Monster""","""Grass""",20,,
4,1526,3,"""Venusaur""","""Seed Pok√©mon""","""Gigantamax""",3.0,,24.0,0.0,"""Grass""",...,0,3,45,"""Medium Slow""",1059860,"""Monster""","""Grass""",20,,


**Dimens√µes do dataset**

In [5]:
print("="*80)
print("DIMENS√ïES DO DATASET")
print("="*80)
print(f"\nTotal de registros (Pok√©mon): {df.shape[0]}")
print(f"Total de colunas: {df.shape[1]}")

DIMENS√ïES DO DATASET

Total de registros (Pok√©mon): 1382
Total de colunas: 45


**Informa√ß√µes sobre tipos de dados e valores nulos**

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1382 entries, 0 to 1381
Data columns (total 45 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Pokemon Id                         1382 non-null   int64  
 1   Pokedex Number                     1382 non-null   int64  
 2   Pokemon Name                       1382 non-null   object 
 3   Classification                     1382 non-null   object 
 4   Alternate Form Name                328 non-null    object 
 5   Original Pokemon ID                328 non-null    float64
 6   Legendary Type                     157 non-null    object 
 7   Pokemon Height                     1382 non-null   float64
 8   Pokemon Weight                     1382 non-null   float64
 9   Primary Type                       1382 non-null   object 
 10  Secondary Type                     752 non-null    object 
 11  Primary Ability                    1382 non-null   objec

**Lista completa de colunas**

In [7]:
df.columns.tolist()

['Pokemon Id',
 'Pokedex Number',
 'Pokemon Name',
 'Classification',
 'Alternate Form Name',
 'Original Pokemon ID',
 'Legendary Type',
 'Pokemon Height',
 'Pokemon Weight',
 'Primary Type',
 'Secondary Type',
 'Primary Ability',
 'Primary Ability Description',
 'Secondary Ability',
 'Secondary Ability Description',
 'Hidden Ability',
 'Hidden Ability Description',
 'Special Event Ability',
 'Special Event Ability Description',
 'Male Ratio',
 'Female Ratio',
 'Base Happiness',
 'Game(s) of Origin',
 'Health Stat',
 'Attack Stat',
 'Defense Stat',
 'Special Attack Stat',
 'Special Defense Stat',
 'Speed Stat',
 'Base Stat Total',
 'Health EV',
 'Attack EV',
 'Defense EV',
 'Special Attack EV',
 'Special Defense EV',
 'Speed EV',
 'EV Yield Total',
 'Catch Rate',
 'Experience Growth',
 'Experience Growth Total',
 'Primary Egg Group',
 'Secondary Egg Group',
 'Egg Cycle Count',
 'Pre-Evolution Pokemon Id',
 'Evolution Details']

---

### 2.2. An√°lise de Qualidade dos Dados

**Valores nulos por coluna**

In [None]:
print("="*80)
print("VALORES NULOS NO DATASET")
print("="*80)
print(df.isnull().sum())
print(f"\nTotal de valores nulos no dataset: {df.isnull().sum().sum()}")

Pokemon Id                              0
Pokedex Number                          0
Pokemon Name                            0
Classification                          0
Alternate Form Name                  1054
Original Pokemon ID                  1054
Legendary Type                       1225
Pokemon Height                          0
Pokemon Weight                          0
Primary Type                            0
Secondary Type                        630
Primary Ability                         0
Primary Ability Description             0
Secondary Ability                     730
Secondary Ability Description         730
Hidden Ability                        338
Hidden Ability Description            338
Special Event Ability                1380
Special Event Ability Description    1380
Male Ratio                              0
Female Ratio                            0
Base Happiness                          0
Game(s) of Origin                       0
Health Stat                       

**Percentual de nulos por coluna**

In [9]:
null_percentage = (df.isnull().sum() / len(df) * 100).round(2)
print("="*80)
print("PERCENTUAL DE VALORES NULOS")
print("="*80)
print(null_percentage[null_percentage > 0])

PERCENTUAL DE VALORES NULOS
Alternate Form Name                  76.27
Original Pokemon ID                  76.27
Legendary Type                       88.64
Secondary Type                       45.59
Secondary Ability                    52.82
Secondary Ability Description        52.82
Hidden Ability                       24.46
Hidden Ability Description           24.46
Special Event Ability                99.86
Special Event Ability Description    99.86
Secondary Egg Group                  74.89
Pre-Evolution Pokemon Id             56.01
Evolution Details                    58.68
dtype: float64


**Verifica√ß√£o de duplicados**

In [10]:
duplicados = df.duplicated().sum()
print("="*80)
print("REGISTROS DUPLICADOS")
print("="*80)
print(f"N√∫mero de registros duplicados: {duplicados}")
if duplicados == 0:
    print("‚úì N√£o h√° duplicados no dataset!")

REGISTROS DUPLICADOS
N√∫mero de registros duplicados: 3


---

### 2.3. Estat√≠sticas Descritivas

**Vari√°veis num√©ricas**

In [11]:
df.describe()

Unnamed: 0,Pokemon Id,Pokedex Number,Original Pokemon ID,Pokemon Height,Pokemon Weight,Male Ratio,Female Ratio,Base Happiness,Health Stat,Attack Stat,...,Attack EV,Defense EV,Special Attack EV,Special Defense EV,Speed EV,EV Yield Total,Catch Rate,Experience Growth Total,Egg Cycle Count,Pre-Evolution Pokemon Id
count,1382.0,1382.0,328.0,1382.0,1382.0,1382.0,1382.0,1382.0,1382.0,1382.0,...,1382.0,1382.0,1382.0,1382.0,1382.0,1382.0,1382.0,1382.0,1382.0,608.0
mean,1026.651954,506.20767,823.338415,1.988495,68.316353,44.328871,37.870839,47.272069,71.506512,80.960926,...,0.539797,0.247467,0.424747,0.22576,0.318379,1.976122,93.745297,1059808.0,30.864689,859.848684
std,574.699143,299.02744,521.455495,5.282591,126.529616,28.610044,26.523509,20.092395,26.480814,31.369489,...,0.931122,0.64659,0.868498,0.642206,0.704754,0.757398,75.734484,153132.8,28.693386,545.418302
min,1.0,1.0,3.0,0.1,0.0,0.0,0.0,0.0,1.0,5.0,...,0.0,0.0,0.0,0.0,0.0,1.0,3.0,600000.0,0.0,1.0
25%,471.25,229.25,325.0,0.5,7.425,25.0,12.5,50.0,54.0,58.0,...,0.0,0.0,0.0,0.0,0.0,1.0,45.0,1000000.0,20.0,361.5
50%,1092.5,514.5,944.5,1.0,27.55,50.0,50.0,50.0,70.0,79.0,...,0.0,0.0,0.0,0.0,0.0,2.0,60.0,1000000.0,20.0,905.5
75%,1536.75,761.75,1267.0,1.6,69.8,50.0,50.0,50.0,85.0,100.0,...,1.0,0.0,0.0,0.0,0.0,3.0,135.0,1250000.0,25.0,1319.25
max,1886.0,1025.0,1875.0,100.0,999.9,100.0,100.0,140.0,255.0,190.0,...,3.0,3.0,3.0,3.0,3.0,4.0,255.0,1640000.0,120.0,1877.0


**Vari√°veis categ√≥ricas**

In [12]:
df.describe(include='object')

Unnamed: 0,Pokemon Name,Classification,Alternate Form Name,Legendary Type,Primary Type,Secondary Type,Primary Ability,Primary Ability Description,Secondary Ability,Secondary Ability Description,Hidden Ability,Hidden Ability Description,Special Event Ability,Special Event Ability Description,Game(s) of Origin,Experience Growth,Primary Egg Group,Secondary Egg Group,Evolution Details
count,1382,1382,328,157,1382,752,1382,1382,652,652,1044,1044,2,2,1382,1382,1382,347,571
unique,1025,728,162,3,18,18,249,243,135,133,166,163,2,2,20,6,15,13,185
top,"""Unown""","""Symbol Pok√©mon""","""Mega""","""Sub-Legendary""","""Water""","""Flying""","""Levitate""","""By floating in the air, the Pok√©mon receives ...","""Compound Eyes""","""The Pok√©mon's compound eyes boost its accuracy.""","""Friend Guard""","""Reduces damage done to allies.""","""Battle Bond""","""When the Pok√©mon knocks out a target, its bon...","""Black""","""Medium Fast""","""Field""","""Dragon""","""Level 30"""
freq,28,28,44,70,159,152,69,69,24,24,27,27,1,1,166,596,317,68,29


---

### 2.4 Tipos de Dados por Coluna

**Tipos de dados de cada coluna**

In [13]:
df.dtypes

Pokemon Id                             int64
Pokedex Number                         int64
Pokemon Name                          object
Classification                        object
Alternate Form Name                   object
Original Pokemon ID                  float64
Legendary Type                        object
Pokemon Height                       float64
Pokemon Weight                       float64
Primary Type                          object
Secondary Type                        object
Primary Ability                       object
Primary Ability Description           object
Secondary Ability                     object
Secondary Ability Description         object
Hidden Ability                        object
Hidden Ability Description            object
Special Event Ability                 object
Special Event Ability Description     object
Male Ratio                           float64
Female Ratio                         float64
Base Happiness                         int64
Game(s) of

**Separar colunas por tipo (num√©ricas e categ√≥ricas)**

In [14]:
print("Colunas num√©ricas:")
df.select_dtypes(include=['int64', 'float64']).columns.tolist()

Colunas num√©ricas:


['Pokemon Id',
 'Pokedex Number',
 'Original Pokemon ID',
 'Pokemon Height',
 'Pokemon Weight',
 'Male Ratio',
 'Female Ratio',
 'Base Happiness',
 'Health Stat',
 'Attack Stat',
 'Defense Stat',
 'Special Attack Stat',
 'Special Defense Stat',
 'Speed Stat',
 'Base Stat Total',
 'Health EV',
 'Attack EV',
 'Defense EV',
 'Special Attack EV',
 'Special Defense EV',
 'Speed EV',
 'EV Yield Total',
 'Catch Rate',
 'Experience Growth Total',
 'Egg Cycle Count',
 'Pre-Evolution Pokemon Id']

In [15]:

print("\nColunas categ√≥ricas/texto:")
df.select_dtypes(include=['object']).columns.tolist()


Colunas categ√≥ricas/texto:


['Pokemon Name',
 'Classification',
 'Alternate Form Name',
 'Legendary Type',
 'Primary Type',
 'Secondary Type',
 'Primary Ability',
 'Primary Ability Description',
 'Secondary Ability',
 'Secondary Ability Description',
 'Hidden Ability',
 'Hidden Ability Description',
 'Special Event Ability',
 'Special Event Ability Description',
 'Game(s) of Origin',
 'Experience Growth',
 'Primary Egg Group',
 'Secondary Egg Group',
 'Evolution Details']

---

### 2.5 Valores √önicos e Cardinalidade

**N√∫mero de valores √∫nicos por coluna**

In [16]:
df.nunique()

Pokemon Id                           1342
Pokedex Number                       1025
Pokemon Name                         1025
Classification                        728
Alternate Form Name                   162
Original Pokemon ID                   190
Legendary Type                          3
Pokemon Height                         84
Pokemon Weight                        522
Primary Type                           18
Secondary Type                         18
Primary Ability                       249
Primary Ability Description           243
Secondary Ability                     135
Secondary Ability Description         133
Hidden Ability                        166
Hidden Ability Description            163
Special Event Ability                   2
Special Event Ability Description       2
Male Ratio                              7
Female Ratio                            7
Base Happiness                          8
Game(s) of Origin                      20
Health Stat                       

**Valores √∫nicos das colunas categ√≥ricas principais**

In [17]:
colunas_categoricas = df.select_dtypes(include=['object']).columns

for coluna in colunas_categoricas:
    print(f"\n{coluna}:")
    print(f"Valores √∫nicos: {df[coluna].nunique()}")
    if df[coluna].nunique() <= 20:  # Mostrar apenas se tiver at√© 20 valores √∫nicos
        print(f"Valores: {df[coluna].unique()}")


Pokemon Name:
Valores √∫nicos: 1025

Classification:
Valores √∫nicos: 728

Alternate Form Name:
Valores √∫nicos: 162

Legendary Type:
Valores √∫nicos: 3
Valores: [nan '"Sub-Legendary"' '"Legendary"' '"Mythical"']

Primary Type:
Valores √∫nicos: 18
Valores: ['"Grass"' '"Fire"' '"Water"' '"Bug"' '"Normal"' '"Dark"' '"Poison"'
 '"Electric"' '"Ground"' '"Ice"' '"Fairy"' '"Steel"' '"Fighting"'
 '"Psychic"' '"Rock"' '"Ghost"' '"Dragon"' '"Flying"']

Secondary Type:
Valores √∫nicos: 18
Valores: ['"Poison"' nan '"Flying"' '"Dragon"' '"Normal"' '"Psychic"' '"Steel"'
 '"Ground"' '"Fairy"' '"Grass"' '"Rock"' '"Fighting"' '"Electric"' '"Ice"'
 '"Dark"' '"Fire"' '"Water"' '"Ghost"' '"Bug"']

Primary Ability:
Valores √∫nicos: 249

Primary Ability Description:
Valores √∫nicos: 243

Secondary Ability:
Valores √∫nicos: 135

Secondary Ability Description:
Valores √∫nicos: 133

Hidden Ability:
Valores √∫nicos: 166

Hidden Ability Description:
Valores √∫nicos: 163

Special Event Ability:
Valores √∫nicos:

---

### 2.6 Amostra dos Dados

**Primeiras 10 linhas do dataset**

In [18]:
df.head(10)

Unnamed: 0,Pokemon Id,Pokedex Number,Pokemon Name,Classification,Alternate Form Name,Original Pokemon ID,Legendary Type,Pokemon Height,Pokemon Weight,Primary Type,...,Speed EV,EV Yield Total,Catch Rate,Experience Growth,Experience Growth Total,Primary Egg Group,Secondary Egg Group,Egg Cycle Count,Pre-Evolution Pokemon Id,Evolution Details
0,1,1,"""Bulbasaur""","""Seed Pok√©mon""",,,,0.7,6.9,"""Grass""",...,0,1,45,"""Medium Slow""",1059860,"""Monster""","""Grass""",20,,
1,2,2,"""Ivysaur""","""Seed Pok√©mon""",,,,1.0,13.0,"""Grass""",...,0,2,45,"""Medium Slow""",1059860,"""Monster""","""Grass""",20,1.0,"""Level 16"""
2,3,3,"""Venusaur""","""Seed Pok√©mon""",,,,2.0,100.0,"""Grass""",...,0,3,45,"""Medium Slow""",1059860,"""Monster""","""Grass""",20,2.0,"""Level 32"""
3,4,3,"""Venusaur""","""Seed Pok√©mon""","""Mega""",3.0,,2.4,155.5,"""Grass""",...,0,3,45,"""Medium Slow""",1059860,"""Monster""","""Grass""",20,,
4,1526,3,"""Venusaur""","""Seed Pok√©mon""","""Gigantamax""",3.0,,24.0,0.0,"""Grass""",...,0,3,45,"""Medium Slow""",1059860,"""Monster""","""Grass""",20,,
5,5,4,"""Charmander""","""Lizard Pok√©mon""",,,,0.6,8.5,"""Fire""",...,1,1,45,"""Medium Slow""",1059860,"""Monster""","""Dragon""",20,,
6,6,5,"""Charmeleon""","""Flame Pok√©mon""",,,,1.1,19.0,"""Fire""",...,1,2,45,"""Medium Slow""",1059860,"""Monster""","""Dragon""",20,5.0,"""Level 16"""
7,7,6,"""Charizard""","""Flame Pok√©mon""",,,,1.7,90.5,"""Fire""",...,0,3,45,"""Medium Slow""",1059860,"""Monster""","""Dragon""",20,6.0,"""Level 36"""
8,8,6,"""Charizard""","""Flame Pok√©mon""","""Mega X""",7.0,,1.7,110.5,"""Fire""",...,0,3,45,"""Medium Slow""",1059860,"""Monster""","""Dragon""",20,,
9,9,6,"""Charizard""","""Flame Pok√©mon""","""Mega Y""",7.0,,1.7,100.5,"""Fire""",...,0,3,45,"""Medium Slow""",1059860,"""Monster""","""Dragon""",20,,


**√öltimas 10 linhas do dataset**

In [19]:
df.tail(10)

Unnamed: 0,Pokemon Id,Pokedex Number,Pokemon Name,Classification,Alternate Form Name,Original Pokemon ID,Legendary Type,Pokemon Height,Pokemon Weight,Primary Type,...,Speed EV,EV Yield Total,Catch Rate,Experience Growth,Experience Growth Total,Primary Egg Group,Secondary Egg Group,Egg Cycle Count,Pre-Evolution Pokemon Id,Evolution Details
1372,1871,1018,"""Archaludon""","""Alloy Pok√©mon""",,,,2.0,60.0,"""Steel""",...,0,3,10,"""Medium Fast""",1000000,"""Mineral""","""Dragon""",30,1424.0,"""Metal Alloy"""
1373,1883,1019,"""Hydrapple""","""Apple Hydra Pok√©mon""",,,,1.8,93.0,"""Grass""",...,0,3,10,"""Erratic""",600000,"""Grass""","""Dragon""",20,1870.0,"""Knowing Dragon Cheer"""
1374,1885,1020,"""Gouging Fire""","""Paradox Pok√©mon""",,,,3.5,590.0,"""Fire""",...,0,3,10,"""Slow""",1250000,"""No Eggs Discovered""",,50,,
1375,1872,1021,"""Raging Bolt""","""Paradox Pok√©mon""",,,,5.2,480.0,"""Electric""",...,0,3,10,"""Slow""",1250000,"""No Eggs Discovered""",,50,,
1376,1886,1022,"""Iron Boulder""","""Paradox Pok√©mon""",,,,1.5,162.5,"""Rock""",...,3,3,10,"""Slow""",1250000,"""No Eggs Discovered""",,50,,
1377,1873,1023,"""Iron Crown""","""Paradox Pok√©mon""",,,,1.6,156.0,"""Steel""",...,0,3,10,"""Slow""",1250000,"""No Eggs Discovered""",,50,,
1378,1882,1024,"""Terapagos""","""Tera Pok√©mon""","""Stellar""",1769.0,"""Legendary""",1.7,77.0,"""Normal""",...,0,3,255,"""Slow""",1250000,"""No Eggs Discovered""",,0,,
1379,1769,1024,"""Terapagos""","""Tera Pok√©mon""",,,"""Legendary""",0.2,6.5,"""Normal""",...,0,1,255,"""Slow""",1250000,"""No Eggs Discovered""",,0,,
1380,1770,1024,"""Terapagos""","""Tera Pok√©mon""","""Terastal""",1769.0,"""Legendary""",0.3,16.0,"""Normal""",...,0,4,255,"""Slow""",1250000,"""No Eggs Discovered""",,0,,
1381,1884,1025,"""Pecharunt""","""Subjugation Pok√©mon""",,,"""Mythical""",0.3,0.3,"""Poison""",...,0,3,3,"""Slow""",1250000,"""No Eggs Discovered""",,0,,


**Amostra aleat√≥ria de 10 registros**

In [20]:
df.sample(10)

Unnamed: 0,Pokemon Id,Pokedex Number,Pokemon Name,Classification,Alternate Form Name,Original Pokemon ID,Legendary Type,Pokemon Height,Pokemon Weight,Primary Type,...,Speed EV,EV Yield Total,Catch Rate,Experience Growth,Experience Growth Total,Primary Egg Group,Secondary Egg Group,Egg Cycle Count,Pre-Evolution Pokemon Id,Evolution Details
279,298,198,"""Murkrow""","""Darkness Pok√©mon""",,,,0.5,2.1,"""Dark""",...,1,1,30,"""Medium Slow""",1059860,"""Flying""",,20,,
371,382,254,"""Sceptile""","""Forest Pok√©mon""",,,,1.7,52.2,"""Grass""",...,3,3,45,"""Medium Slow""",1059860,"""Monster""","""Dragon""",20,381.0,"""Level 36"""
1134,1499,838,"""Carkol""","""Coal Pok√©mon""",,,,1.1,78.0,"""Rock""",...,0,2,120,"""Medium Slow""",1059860,"""Mineral""",,15,1498.0,"""Level 18"""
714,871,537,"""Seismitoad""","""Vibration Pok√©mon""",,,,1.5,62.0,"""Water""",...,0,3,45,"""Medium Slow""",1059860,"""Water 1""",,20,870.0,"""Level 36"""
396,402,274,"""Nuzleaf""","""Wily Pok√©mon""",,,,1.0,28.0,"""Grass""",...,0,2,120,"""Medium Slow""",1059860,"""Field""","""Grass""",15,401.0,"""Level 14"""
282,324,200,"""Misdreavus""","""Screech Pok√©mon""",,,,0.7,1.0,"""Ghost""",...,0,1,45,"""Fast""",800000,"""Amorphous""",,25,,
1013,1268,742,"""Cutiefly""","""Bee Fly Pok√©mon""",,,,0.1,0.2,"""Bug""",...,1,1,190,"""Medium Fast""",1000000,"""Bug""","""Fairy""",20,,
577,1784,422,"""Shellos""","""Sea Slug Pok√©mon""","""East Sea""",620.0,,0.3,6.3,"""Water""",...,0,1,190,"""Medium Fast""",1000000,"""Water 1""","""Amorphous""",20,,
1170,1468,862,"""Obstagoon""","""Blocking Pok√©mon""",,,,1.6,46.0,"""Dark""",...,0,3,45,"""Medium Fast""",1000000,"""Field""",,15,1467.0,"""Level 35 At Night"""
490,1592,354,"""Banette""","""Marionette Pok√©mon""","""Mega""",482.0,,1.2,13.0,"""Ghost""",...,0,2,45,"""Fast""",800000,"""Amorphous""",,25,,


---

### 2.7 Resumo Completo do Dataset

In [21]:
print("="*80)
print("RESUMO COMPLETO DO DATASET POK√âMON")
print("="*80)
print(f"\nüìä DIMENS√ïES:")
print(f"   ‚Ä¢ Total de Pok√©mon: {df.shape[0]:,}")
print(f"   ‚Ä¢ Total de colunas: {df.shape[1]}")
print(f"\nüî¢ TIPOS DE DADOS:")
print(f"   ‚Ä¢ Colunas num√©ricas: {len(df.select_dtypes(include=['int64', 'float64']).columns)}")
print(f"   ‚Ä¢ Colunas categ√≥ricas: {len(df.select_dtypes(include=['object']).columns)}")
print(f"\n‚ùå VALORES NULOS:")
print(f"   ‚Ä¢ Total: {df.isnull().sum().sum():,}")
print(f"   ‚Ä¢ Percentual: {(df.isnull().sum().sum() / (df.shape[0] * df.shape[1]) * 100):.2f}%")
print(f"\nüîÑ DUPLICADOS:")
print(f"   ‚Ä¢ Total de duplicados: {df.duplicated().sum()}")
print(f"\nüíæ MEM√ìRIA:")
print(f"   ‚Ä¢ Uso de mem√≥ria: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
print("="*80)

RESUMO COMPLETO DO DATASET POK√âMON

üìä DIMENS√ïES:
   ‚Ä¢ Total de Pok√©mon: 1,382
   ‚Ä¢ Total de colunas: 45

üî¢ TIPOS DE DADOS:
   ‚Ä¢ Colunas num√©ricas: 26
   ‚Ä¢ Colunas categ√≥ricas: 19

‚ùå VALORES NULOS:
   ‚Ä¢ Total: 11,479
   ‚Ä¢ Percentual: 18.46%

üîÑ DUPLICADOS:
   ‚Ä¢ Total de duplicados: 3

üíæ MEM√ìRIA:
   ‚Ä¢ Uso de mem√≥ria: 2.01 MB


---

## 3. Limpeza de Dados

Nesta se√ß√£o, aplicamos as transforma√ß√µes necess√°rias para limpar e padronizar o dataset.

---

### 3.1 Reduzindo o n√∫mero de colunas para trabalhar apenas com as colunas mais relevantes

In [22]:
# Selecionar apenas as colunas relevantes para an√°lise
colunas_selecionadas = [
    'Pokedex Number', 'Pokemon Name', 'Classification', 'Alternate Form Name',
    'Legendary Type', 'Pokemon Height', 'Pokemon Weight', 'Primary Type',  
    'Secondary Type', 'Primary Ability', 'Secondary Ability', 'Hidden Ability', 
    'Male Ratio', 'Female Ratio', 'Health Stat', 'Attack Stat', 'Defense Stat', 
    'Special Attack Stat', 'Special Defense Stat', 'Speed Stat', 'Base Stat Total', 
    'Catch Rate', 'Experience Growth', 'Experience Growth Total', 
    'Primary Egg Group', 'Evolution Details'
]

df = df[colunas_selecionadas].copy()

print("‚úì Dataset reduzido para as colunas selecionadas!")
print(f"Total de colunas: {df.shape[1]}")
print(f"Total de registros: {df.shape[0]}")
print("\nColunas do dataset:")
df.columns.tolist()

‚úì Dataset reduzido para as colunas selecionadas!
Total de colunas: 26
Total de registros: 1382

Colunas do dataset:


['Pokedex Number',
 'Pokemon Name',
 'Classification',
 'Alternate Form Name',
 'Legendary Type',
 'Pokemon Height',
 'Pokemon Weight',
 'Primary Type',
 'Secondary Type',
 'Primary Ability',
 'Secondary Ability',
 'Hidden Ability',
 'Male Ratio',
 'Female Ratio',
 'Health Stat',
 'Attack Stat',
 'Defense Stat',
 'Special Attack Stat',
 'Special Defense Stat',
 'Speed Stat',
 'Base Stat Total',
 'Catch Rate',
 'Experience Growth',
 'Experience Growth Total',
 'Primary Egg Group',
 'Evolution Details']

---

### 3.2 Remover aspas de todas as colunas do tipo object (string)

In [23]:
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].str.replace('"', '', regex=False)

print("‚úì Aspas removidas de todas as colunas!")

‚úì Aspas removidas de todas as colunas!


In [24]:
df.head()

Unnamed: 0,Pokedex Number,Pokemon Name,Classification,Alternate Form Name,Legendary Type,Pokemon Height,Pokemon Weight,Primary Type,Secondary Type,Primary Ability,...,Defense Stat,Special Attack Stat,Special Defense Stat,Speed Stat,Base Stat Total,Catch Rate,Experience Growth,Experience Growth Total,Primary Egg Group,Evolution Details
0,1,Bulbasaur,Seed Pok√©mon,,,0.7,6.9,Grass,Poison,Overgrow,...,49,65,65,45,318,45,Medium Slow,1059860,Monster,
1,2,Ivysaur,Seed Pok√©mon,,,1.0,13.0,Grass,Poison,Overgrow,...,63,80,80,60,405,45,Medium Slow,1059860,Monster,Level 16
2,3,Venusaur,Seed Pok√©mon,,,2.0,100.0,Grass,Poison,Overgrow,...,83,100,100,80,525,45,Medium Slow,1059860,Monster,Level 32
3,3,Venusaur,Seed Pok√©mon,Mega,,2.4,155.5,Grass,Poison,Thick Fat,...,123,122,120,80,625,45,Medium Slow,1059860,Monster,
4,3,Venusaur,Seed Pok√©mon,Gigantamax,,24.0,0.0,Grass,Poison,Overgrow,...,83,100,100,80,525,45,Medium Slow,1059860,Monster,


---

### 3.3 Preencher valores nulos na coluna "Alternate form name" e Ajustar as categoriza√ß√µes corretamente

In [25]:
print(f"Valores nulos na coluna 'Alternate Form Name' antes: {df['Alternate Form Name'].isnull().sum()}")

Valores nulos na coluna 'Alternate Form Name' antes: 1054


**Preenchendo formas alternativas espec√≠ficas**

In [26]:
# Dicion√°rio com os Pok√©mon e suas formas alternativas espec√≠ficas
pokemon_formas_especificas = {
    'Giratina': ['Altered', 'Origin'],
    'Tornadus': ['Incarnate', 'Therian'],
    'Thundurus': ['Incarnate', 'Therian'],
    'Landorus': ['Incarnate', 'Therian'],
    'Enamorus': ['Incarnate', 'Therian'],
    'Kyurem': ['Normal', 'Black', 'White'],
    'Keldeo': ['Ordinary', 'Resolute'],
    'Meloetta': ['Aria', 'Pirouette'],
    'Hoopa': ['Hoopa Confined', 'Hoopa Unbound'],
    'Zacian': ['Hero of Many Battles', 'Crowned Sword'],
    'Zamazenta': ['Hero of Many Battles', 'Crowned Shield'],
    'Toxtricity': ['Amped', 'Low Key'],
    'Urshifu': ['Single Strike Style', 'Rapid Strike Style'],
    'Zygarde': ['10% Forme', '50% Forme', 'Complete Forme'],
    'Ogerpon': ['Teal Mask', 'Wellspring Mask', 'Hearthflame Mask', 'Cornerstone Mask'],
    'Shaymin': ['Land Forme', 'Sky Forme'],
    'Cherrim': ['Overcast Form', 'Sunshine Form'],
    'Aegislash': ['Shield Forme', 'Blade Forme'],
    'Wishiwashi': ['Solo Form', 'School Form'],
    'Mimikyu': ['Disguised Form', 'Busted Form'],
    'Eiscue': ['Ice Face', 'Noice Face'],
    'Morpeko': ['Full Belly Mode', 'Hangry Mode'],
    'Palafin': ['Zero Form', 'Hero Form'],
    'Oricorio': ['Baile Style', 'Pom-Pom Style', "Pa'u Style", 'Sensu Style'],
    'Lycanroc': ['Midday Form', 'Midnight Form', 'Dusk Form'],
    'Tatsugiri': ['Curly Form', 'Droopy Form', 'Stretchy Form']
}

print("PREENCHENDO FORMAS ALTERNATIVAS ESPEC√çFICAS")

total_preenchidos = 0

for pokemon_name, formas_possiveis in pokemon_formas_especificas.items():
    # Buscar todos os registros deste Pok√©mon
    pokemon_mask = df['Pokemon Name'] == pokemon_name
    pokemon_registros = df[pokemon_mask]
    
    if len(pokemon_registros) > 0:
        print(f"\n{pokemon_name}:")
        print(f"  Total de registros: {len(pokemon_registros)}")
        
        # Verificar quais formas j√° existem
        formas_existentes = pokemon_registros['Alternate Form Name'].dropna().unique().tolist()
        print(f"  Formas j√° registradas: {formas_existentes if formas_existentes else 'Nenhuma'}")
        
        # Encontrar registros com valores nulos
        pokemon_nulos = df[pokemon_mask & df['Alternate Form Name'].isnull()]
        
        if len(pokemon_nulos) > 0:
            print(f"  Registros com valores nulos: {len(pokemon_nulos)}")
            
            # Determinar quais formas faltam
            formas_faltantes = [forma for forma in formas_possiveis if forma not in formas_existentes]
            print(f"  Formas faltantes: {formas_faltantes}")
            
            # Preencher os nulos com as formas faltantes
            for idx, (registro_idx, registro) in enumerate(pokemon_nulos.iterrows()):
                if idx < len(formas_faltantes):
                    df.loc[registro_idx, 'Alternate Form Name'] = formas_faltantes[idx]
                    print(f"    ‚úì √çndice {registro_idx}: Preenchido com '{formas_faltantes[idx]}'")
                    total_preenchidos += 1
                else:
                    print(f"    ‚ö†Ô∏è  √çndice {registro_idx}: Mais registros nulos que formas dispon√≠veis")
        else:
            print(f"  ‚úì Nenhum valor nulo encontrado")
    else:
        print(f"\n{pokemon_name}: N√£o encontrado no dataset")

print("\n")
print(f"TOTAL DE VALORES PREENCHIDOS: {total_preenchidos}")

PREENCHENDO FORMAS ALTERNATIVAS ESPEC√çFICAS

Giratina:
  Total de registros: 2
  Formas j√° registradas: ['Origin']
  Registros com valores nulos: 1
  Formas faltantes: ['Altered']
    ‚úì √çndice 659: Preenchido com 'Altered'

Tornadus:
  Total de registros: 2
  Formas j√° registradas: ['Therian']
  Registros com valores nulos: 1
  Formas faltantes: ['Incarnate']
    ‚úì √çndice 839: Preenchido com 'Incarnate'

Thundurus:
  Total de registros: 2
  Formas j√° registradas: ['Therian']
  Registros com valores nulos: 1
  Formas faltantes: ['Incarnate']
    ‚úì √çndice 841: Preenchido com 'Incarnate'

Landorus:
  Total de registros: 2
  Formas j√° registradas: ['Therian']
  Registros com valores nulos: 1
  Formas faltantes: ['Incarnate']
    ‚úì √çndice 845: Preenchido com 'Incarnate'

Enamorus:
  Total de registros: 2
  Formas j√° registradas: ['Therian']
  Registros com valores nulos: 1
  Formas faltantes: ['Incarnate']
    ‚úì √çndice 1242: Preenchido com 'Incarnate'

Kyurem:
  Total d

In [27]:
df['Alternate Form Name'] = df['Alternate Form Name'].fillna('None')

In [28]:
print(f"\nValores nulos na coluna 'Alternate Form Name' ap√≥s: {df['Alternate Form Name'].isnull().sum()}")


Valores nulos na coluna 'Alternate Form Name' ap√≥s: 0



**Verificar formas alternativas no dataset**

In [29]:
# Verificar formas alternativas antes da transforma√ß√£o
print("Formas alternativas √∫nicas:")
df['Alternate Form Name'].value_counts()

Formas alternativas √∫nicas:


Alternate Form Name
None          1027
Mega            44
Gigantamax      32
Hisui           23
Alola           21
              ... 
Ordinary         1
Resolute         1
Aria             1
Pirouette        1
Terastal         1
Name: count, Length: 186, dtype: int64

**Colocar formas alternativas no nome dos Pok√©mon**

In [30]:
# Atualizar a coluna 'Pokemon Name' para incluir a 'Alternate Form Name' quando diferente de 'None'
df['Pokemon Name'] = df.apply(
    lambda row: f"{row['Pokemon Name']} ({row['Alternate Form Name']})" 
    if row['Alternate Form Name'] != 'None' 
    else row['Pokemon Name'], 
    axis=1
)

print("‚úì Coluna 'Pokemon Name' atualizada com 'Alternate Form Name'!")
print(f"\nExemplos:")
print(df[df['Alternate Form Name'] != 'None'][['Pokemon Name', 'Alternate Form Name']].head(10))

‚úì Coluna 'Pokemon Name' atualizada com 'Alternate Form Name'!

Exemplos:
               Pokemon Name Alternate Form Name
3           Venusaur (Mega)                Mega
4     Venusaur (Gigantamax)          Gigantamax
8        Charizard (Mega X)              Mega X
9        Charizard (Mega Y)              Mega Y
10   Charizard (Gigantamax)          Gigantamax
14         Blastoise (Mega)                Mega
15   Blastoise (Gigantamax)          Gigantamax
19  Butterfree (Gigantamax)          Gigantamax
23          Beedrill (Mega)                Mega
27           Pidgeot (Mega)                Mega


Exemplos:
               Pokemon Name Alternate Form Name
3           Venusaur (Mega)                Mega
4     Venusaur (Gigantamax)          Gigantamax
8        Charizard (Mega X)              Mega X
9        Charizard (Mega Y)              Mega Y
10   Charizard (Gigantamax)          Gigantamax
14         Blastoise (Mega)                Mega
15   Blastoise (Gigantamax)          Gigantamax
1

**Ajuste de formas Gigantamax e Mega**

In [31]:
df['Alternate Form Name'] = df['Alternate Form Name'].str.replace('Rapid Strike Gigantamax', 'Gigantamax', regex=False)
df['Alternate Form Name'] = df['Alternate Form Name'].str.replace('Low Key Gigantamax', 'Gigantamax', regex=False)

print("‚úì Sufixos removidos das formas Gigantamax!")
print("\nVerificar formas Gigantamax ap√≥s transforma√ß√£o:")
print(df[df['Alternate Form Name'].str.contains('Gigantamax', na=False)]['Alternate Form Name'].unique())

‚úì Sufixos removidos das formas Gigantamax!

Verificar formas Gigantamax ap√≥s transforma√ß√£o:
['Gigantamax']


In [32]:
# Remover sufixos X e Y de formas Mega na coluna 'Alternate Form Name'
df['Alternate Form Name'] = df['Alternate Form Name'].str.replace('Mega X', 'Mega', regex=False)
df['Alternate Form Name'] = df['Alternate Form Name'].str.replace('Mega Y', 'Mega', regex=False)

print("‚úì Sufixos X e Y removidos das formas Mega!")
print("\nVerificar formas diferentes ap√≥s transforma√ß√£o:")
df['Alternate Form Name'].unique()

‚úì Sufixos X e Y removidos das formas Mega!

Verificar formas diferentes ap√≥s transforma√ß√£o:


array(['None', 'Mega', 'Gigantamax', 'Alola', 'Starter', 'Galar', 'Hisui',
       'Paldean Combat Breed', 'Paldean Blaze Breed',
       'Paldean Aqua Breed', 'Paldea', 'A', 'B', 'C', 'D', 'E', 'G', 'H',
       'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U',
       'V', 'W', 'X', 'Y', 'Z', '!', '?', 'Sunny', 'Rainy', 'Snowy',
       'Primal', 'Attack', 'Defense', 'Speed', 'Sandy', 'Trash',
       'Overcast Form', 'East Sea', 'Mow', 'Heat', 'Wash', 'Frost', 'Fan',
       'Origin', 'Altered', 'Land Forme', 'Sky', 'White-Striped',
       'Blue-Striped', 'Zen', 'Galar Zen', 'Summer', 'Autumn', 'Winter',
       'Incarnate', 'Therian', 'Normal', 'White', 'Black', 'Ordinary',
       'Resolute', 'Aria', 'Pirouette', 'Ash', 'Polar Pattern',
       'Tundra Pattern', 'Continental Pattern', 'Garden Pattern',
       'Elegant Pattern', 'Icy Snow Pattern', 'Modern Pattern',
       'Marine Pattern', 'Archipelago Pattern', 'High Plains Pattern',
       'Sandstorm Pattern', 'River Patter

In [33]:
print("Amostra de Pok√©mon com formas alternativas:")
df[df['Alternate Form Name'] != 'None'][['Pokemon Name', 'Alternate Form Name']].sample(30)

Amostra de Pok√©mon com formas alternativas:


Unnamed: 0,Pokemon Name,Alternate Form Name
1178,Alcremie (Ruby Cream),Ruby Cream
880,Vivillon (Marine Pattern),Marine Pattern
47,Sandslash (Alola),Alola
814,Stunfisk (Galar),Galar
27,Pidgeot (Mega),Mega
708,Audino (Mega),Mega
1107,Cinderace (Gigantamax),Gigantamax
740,Darmanitan (Galar Zen),Galar Zen
281,Slowking (Galar),Galar
1265,Maushold (Family of Three),Family of Three


**Removendo Pok√©mon com formas extras que n√£o influenciam na an√°lise**

In [34]:
def extract_base_name(pokemon_name):
    if pd.isna(pokemon_name):
        return pokemon_name
    # Remove tudo que est√° entre par√™nteses
    return pokemon_name.split('(')[0].strip()

df['Original Name'] = df['Pokemon Name'].apply(extract_base_name)

print("‚úì Coluna 'Base Pokemon Name' criada!")
print(f"\nExemplos:")
print(df[['Pokemon Name', 'Original Name']].head(10).to_string(index=False))

‚úì Coluna 'Base Pokemon Name' criada!

Exemplos:
         Pokemon Name Original Name
            Bulbasaur     Bulbasaur
              Ivysaur       Ivysaur
             Venusaur      Venusaur
      Venusaur (Mega)      Venusaur
Venusaur (Gigantamax)      Venusaur
           Charmander    Charmander
           Charmeleon    Charmeleon
            Charizard     Charizard
   Charizard (Mega X)     Charizard
   Charizard (Mega Y)     Charizard


In [35]:
# Encontrar Pok√©mon com Original Name duplicado (t√™m m√∫ltiplas formas)
pokemon_duplicados = df[df.duplicated(subset=['Original Name'], keep=False)]

print("="*80)
print("AN√ÅLISE DE POK√âMON COM M√öLTIPLAS FORMAS")
print("="*80)

# Colunas de stats e habilidades para compara√ß√£o
colunas_comparacao = [
    'Health Stat', 'Attack Stat', 'Defense Stat', 
    'Special Attack Stat', 'Special Defense Stat', 'Speed Stat',
    'Base Stat Total', 'Primary Ability', 'Secondary Ability', 'Hidden Ability'
]

# Agrupar por Original Name
grupos = pokemon_duplicados.groupby('Original Name')

print(f"\nTotal de Pok√©mon com m√∫ltiplas formas: {len(grupos)}")
print("\n" + "="*80)

# Mostrar todos os Pok√©mon com formas extras que N√ÉO t√™m diferen√ßas em stats/habilidades
print("\nPOK√âMON COM FORMAS SEM DIFEREN√áAS (n√£o relevantes para an√°lise):")
print("="*80)

pokemon_para_remover = []

# Lista de Pok√©mon protegidos (n√£o devem ser removidos)
if 'pokemon_formas_especificas' in locals():
    pokemon_protegidos = list(pokemon_formas_especificas.keys())
    print(f"\nPok√©mon protegidos (n√£o ser√£o removidos): {pokemon_protegidos}")
    print("="*80)
else:
    pokemon_protegidos = []

for nome_original, grupo in grupos:
    # Pular Pok√©mon protegidos
    if nome_original in pokemon_protegidos:
        print(f"\n{nome_original}: ‚ö†Ô∏è Pok√©mon protegido - formas mantidas")
        continue
    
    if len(grupo) > 1:
        # Verificar se todas as formas t√™m os mesmos stats e habilidades
        primeira_forma = grupo.iloc[0]
        
        for idx, forma in grupo.iloc[1:].iterrows():
            formas_identicas = True
            
            for col in colunas_comparacao:
                val1 = primeira_forma[col]
                val2 = forma[col]
                
                # Comparar valores, considerando NaN como iguais
                if pd.isna(val1) and pd.isna(val2):
                    continue
                elif val1 != val2:
                    formas_identicas = False
                    break
            
            # Se esta forma √© id√™ntica √† primeira, adicionar para remo√ß√£o
            if formas_identicas:
                pokemon_para_remover.append(idx)
                
        # Se encontrou formas id√™nticas, mostrar informa√ß√µes
        if any(idx in pokemon_para_remover for idx in grupo.index[1:]):
            print(f"\n{nome_original}:")
            print(f"  Total de formas: {len(grupo)}")
            print(f"  Formas presentes: {grupo['Alternate Form Name'].tolist()}")
            
            formas_identicas_nomes = [
                grupo.loc[idx, 'Alternate Form Name'] 
                for idx in grupo.index[1:] 
                if idx in pokemon_para_remover
            ]
            print(f"  Formas id√™nticas (removidas): {formas_identicas_nomes}")
            print(f"  Forma mantida: {primeira_forma['Alternate Form Name']}")

print("\n" + "="*80)
print(f"\nTotal de registros com formas id√™nticas: {len(pokemon_para_remover)}")
print("\nPrimeiros 10 √≠ndices a serem removidos:", pokemon_para_remover[:10])

AN√ÅLISE DE POK√âMON COM M√öLTIPLAS FORMAS

Total de Pok√©mon com m√∫ltiplas formas: 205


POK√âMON COM FORMAS SEM DIFEREN√áAS (n√£o relevantes para an√°lise):

Pok√©mon protegidos (n√£o ser√£o removidos): ['Giratina', 'Tornadus', 'Thundurus', 'Landorus', 'Enamorus', 'Kyurem', 'Keldeo', 'Meloetta', 'Hoopa', 'Zacian', 'Zamazenta', 'Toxtricity', 'Urshifu', 'Zygarde', 'Ogerpon', 'Shaymin', 'Cherrim', 'Aegislash', 'Wishiwashi', 'Mimikyu', 'Eiscue', 'Morpeko', 'Palafin', 'Oricorio', 'Lycanroc', 'Tatsugiri']

Aegislash: ‚ö†Ô∏è Pok√©mon protegido - formas mantidas

Alakazam:
  Total de formas: 3
  Formas presentes: ['None', 'None', 'Mega']
  Formas id√™nticas (removidas): ['None']
  Forma mantida: None

Alcremie:
  Total de formas: 10
  Formas presentes: ['None', 'Ruby Cream', 'Matcha Cream', 'Mint Cream', 'Lemon Cream', 'Salted Cream', 'Ruby Swirl', 'Caramel Swirl', 'Rainbow Swirl', 'Gigantamax']
  Formas id√™nticas (removidas): ['Ruby Cream', 'Matcha Cream', 'Mint Cream', 'Lemon Cream', 'Sa

In [36]:
# Remover formas id√™nticas identificadas anteriormente
print(f"Registros antes da remo√ß√£o: {len(df)}")
df = df.drop(index=pokemon_para_remover)
df = df.reset_index(drop=True)
print(f"Registros ap√≥s remo√ß√£o de formas id√™nticas: {len(df)}")
print(f"Total removido: {len(pokemon_para_remover)}")
print("\n" + "="*80)

Registros antes da remo√ß√£o: 1382
Registros ap√≥s remo√ß√£o de formas id√™nticas: 1226
Total removido: 156



In [37]:
print("="*80)
print("RESUMO FINAL DE FORMAS ALTERNATIVAS")
print("="*80)

pokemon_com_formas_final = df[df.duplicated(subset=['Original Name'], keep=False)]

print(f"\nTotal de Pok√©mon √∫nicos: {df['Original Name'].nunique()}")
print(f"Total de registros no dataset: {len(df)}")
print(f"Pok√©mon com m√∫ltiplas formas restantes: {len(pokemon_com_formas_final.groupby('Original Name'))}")

print("\n" + "="*80)
print("Formas alternativas que permaneceram no dataset:")
print("="*80)

if len(pokemon_com_formas_final) > 0:
    grupos_final = pokemon_com_formas_final.groupby('Original Name')
    for nome, grupo in grupos_final:
        print(f"\n{nome}: {grupo['Alternate Form Name'].tolist()}")
else:
    print("\nNenhum Pok√©mon com m√∫ltiplas formas no dataset final!")

print("\n" + "="*80)

RESUMO FINAL DE FORMAS ALTERNATIVAS

Total de Pok√©mon √∫nicos: 1024
Total de registros no dataset: 1226
Pok√©mon com m√∫ltiplas formas restantes: 141

Formas alternativas que permaneceram no dataset:

Abomasnow: ['None', 'Mega']

Absol: ['None', 'Mega']

Aegislash: ['Shield Forme', 'Blade']

Aerodactyl: ['None', 'Mega']

Aggron: ['None', 'Mega']

Alakazam: ['None', 'Mega']

Altaria: ['None', 'Mega']

Ampharos: ['None', 'Mega']

Arcanine: ['None', 'Hisui']

Articuno: ['None', 'Galar']

Audino: ['None', 'Mega']

Avalugg: ['Hisui', 'None']

Banette: ['None', 'Mega']

Basculegion: ['Female', 'None']

Basculin: ['None', 'White-Striped', 'Blue-Striped']

Beedrill: ['None', 'Mega']

Blastoise: ['None', 'Mega']

Blaziken: ['None', 'Mega']

Braviary: ['None', 'Hisui', 'Hisui']

Calyrex: ['None', 'Ice Rider', 'Shadow Rider']

Camerupt: ['None', 'Mega']

Charizard: ['None', 'Mega', 'Mega']

Corsola: ['None', 'Galar']

Darmanitan: ['None', 'Zen', 'Galar', 'Galar Zen']

Decidueye: ['None', 'Hisui'

**Corre√ß√£o de m√∫ltiplas formas desnecess√°rias de "Urshifu"**

In [38]:
print("="*80)
print("CORRE√á√ÉO DAS FORMAS DE URSHIFU")
print("="*80)

# Buscar todos os Urshifu
urshifu_all = df[df['Original Name'] == 'Urshifu']

print(f"\nTotal de formas de Urshifu antes da corre√ß√£o: {len(urshifu_all)}")
print("\nFormas atuais:")
print(urshifu_all[['Pokemon Name', 'Alternate Form Name', 'Primary Type', 'Secondary Type', 'Evolution Details']].to_string())

indices_urshifu_para_remover = []

# Processar cada forma
for idx, row in urshifu_all.iterrows():
    evolution_details = str(row['Evolution Details']) if pd.notna(row['Evolution Details']) else ''
    alternate_form = row['Alternate Form Name']
    secondary_type = row['Secondary Type']
    
    # Formas n√£o-Gigantamax
    if 'Gigantamax' not in alternate_form:
        # Manter apenas as formas com Evolution Details espec√≠ficos
        if 'Scroll of Darkness' in evolution_details:
            # Esta deve ser Single Strike Style - corrigir se necess√°rio
            if secondary_type == 'Dark':
                df.loc[idx, 'Alternate Form Name'] = 'Single Strike Style'
                df.loc[idx, 'Pokemon Name'] = 'Urshifu (Single Strike Style)'
                print(f"\n‚úì √çndice {idx}: Corrigido para Single Strike Style (Scroll of Darkness)")
            else:
                # Evolution Details incorreto - remover
                indices_urshifu_para_remover.append(idx)
                print(f"\n‚úó √çndice {idx}: Evolution Details 'Scroll of Darkness' mas tipo secund√°rio n√£o √© Dark - remover")
        elif 'Scroll of Waters' in evolution_details:
            # Esta deve ser Rapid Strike Style - corrigir se necess√°rio
            if secondary_type == 'Water':
                df.loc[idx, 'Alternate Form Name'] = 'Rapid Strike Style'
                df.loc[idx, 'Pokemon Name'] = 'Urshifu (Rapid Strike Style)'
                print(f"\n‚úì √çndice {idx}: Corrigido para Rapid Strike Style (Scroll of Waters)")
            else:
                # Evolution Details incorreto - remover
                indices_urshifu_para_remover.append(idx)
                print(f"\n‚úó √çndice {idx}: Evolution Details 'Scroll of Waters' mas tipo secund√°rio n√£o √© Water - remover")
        else:
            # Sem Evolution Details correto - remover
            indices_urshifu_para_remover.append(idx)
            print(f"\n‚úó √çndice {idx}: Sem Evolution Details correto - remover")
            
    # Formas Gigantamax
    else:
        if secondary_type == 'Dark':
            # Single Strike Gigantamax - manter Alternate Form Name como "Gigantamax"
            df.loc[idx, 'Alternate Form Name'] = 'Gigantamax'
            df.loc[idx, 'Pokemon Name'] = 'Urshifu (Single Strike Gigantamax)'
            print(f"\n‚úì √çndice {idx}: Corrigido para Single Strike Gigantamax (Alternate Form Name: Gigantamax)")
        elif secondary_type == 'Water':
            # Rapid Strike Gigantamax - manter Alternate Form Name como "Gigantamax"
            df.loc[idx, 'Alternate Form Name'] = 'Gigantamax'
            df.loc[idx, 'Pokemon Name'] = 'Urshifu (Rapid Strike Gigantamax)'
            print(f"\n‚úì √çndice {idx}: Corrigido para Rapid Strike Gigantamax (Alternate Form Name: Gigantamax)")
        else:
            # Tipo incorreto - remover
            indices_urshifu_para_remover.append(idx)
            print(f"\n‚úó √çndice {idx}: Gigantamax com tipo secund√°rio incorreto - remover")

print("\n" + "="*80)
print(f"\nTotal de formas de Urshifu marcadas para remo√ß√£o: {len(indices_urshifu_para_remover)}")
print(f"√çndices: {indices_urshifu_para_remover}")
print("="*80)

CORRE√á√ÉO DAS FORMAS DE URSHIFU

Total de formas de Urshifu antes da corre√ß√£o: 6

Formas atuais:
                           Pokemon Name  Alternate Form Name Primary Type Secondary Type                                       Evolution Details
1070      Urshifu (Single Strike Style)  Single Strike Style     Fighting           Dark  Conquer the Tower of Darkness in Galar's Isle of Armor
1071       Urshifu (Rapid Strike Style)   Rapid Strike Style     Fighting           Dark                                      Scroll of Darkness
1072             Urshifu (Rapid Strike)         Rapid Strike     Fighting          Water    Conquer the Tower of Waters in Galar's Isle of Armor
1073             Urshifu (Rapid Strike)         Rapid Strike     Fighting          Water                                        Scroll of Waters
1074               Urshifu (Gigantamax)           Gigantamax     Fighting           Dark                                                     NaN
1075  Urshifu (Rapid Strike Gi

In [39]:
# Remover formas indesejadas de Urshifu
if len(indices_urshifu_para_remover) > 0:
    print(f"\nRegistros antes da remo√ß√£o: {len(df)}")
    df = df.drop(index=indices_urshifu_para_remover)
    df = df.reset_index(drop=True)
    print(f"Registros ap√≥s remo√ß√£o: {len(df)}")
    print(f"Total removido: {len(indices_urshifu_para_remover)}")
    print("\n‚úì Formas de Urshifu corrigidas com sucesso!")
else:
    print("\nNenhuma forma de Urshifu para remover.")

print("\n" + "="*80)


Registros antes da remo√ß√£o: 1226
Registros ap√≥s remo√ß√£o: 1224
Total removido: 2

‚úì Formas de Urshifu corrigidas com sucesso!



In [40]:
# Verificar resultado final de Urshifu
print("="*80)
print("VERIFICA√á√ÉO FINAL - URSHIFU")
print("="*80)

urshifu_final = df[df['Original Name'] == 'Urshifu']
print(f"\nTotal de formas de Urshifu: {len(urshifu_final)}")
print("\nFormas finais:")
print(urshifu_final[['Pokemon Name', 'Alternate Form Name', 'Primary Type', 'Secondary Type', 'Evolution Details']].to_string())

print("\n" + "="*80)
print("‚úì Resultado esperado: 4 formas")
print("  1. Urshifu (Single Strike Style) - Alternate Form: Single Strike Style")
print("  2. Urshifu (Rapid Strike Style) - Alternate Form: Rapid Strike Style")
print("  3. Urshifu (Single Strike Gigantamax) - Alternate Form: Gigantamax")
print("  4. Urshifu (Rapid Strike Gigantamax) - Alternate Form: Gigantamax")
print("="*80)

VERIFICA√á√ÉO FINAL - URSHIFU

Total de formas de Urshifu: 4

Formas finais:
                            Pokemon Name  Alternate Form Name Primary Type Secondary Type   Evolution Details
1070       Urshifu (Single Strike Style)  Single Strike Style     Fighting           Dark  Scroll of Darkness
1071        Urshifu (Rapid Strike Style)   Rapid Strike Style     Fighting          Water    Scroll of Waters
1072  Urshifu (Single Strike Gigantamax)           Gigantamax     Fighting           Dark                 NaN
1073   Urshifu (Rapid Strike Gigantamax)           Gigantamax     Fighting          Water                 NaN

‚úì Resultado esperado: 4 formas
  1. Urshifu (Single Strike Style) - Alternate Form: Single Strike Style
  2. Urshifu (Rapid Strike Style) - Alternate Form: Rapid Strike Style
  3. Urshifu (Single Strike Gigantamax) - Alternate Form: Gigantamax
  4. Urshifu (Rapid Strike Gigantamax) - Alternate Form: Gigantamax


**An√°lise detalhada de Pok√©mon espec√≠ficos com m√∫ltiplas formas**

In [41]:
# Pok√©mon espec√≠ficos para an√°lise detalhada
pokemon_destaque = [
    'Basculin', 'Exeggutor', 'Gourgeist', 'Lilligant', 
    'Marowak', 'Minior', 'Samurott',  'Sliggoo', 'Squawkabilly', 
    'Toxtricity', 'Vivillon', 'Raichu', 'Typhlosion', 'Braviary', 'Decidueye' 
]

print("="*80)
print("AN√ÅLISE DETALHADA DE POK√âMON ESPEC√çFICOS COM M√öLTIPLAS FORMAS")
print("="*80)

# Colunas para exibi√ß√£o
colunas_exibir = [
    'Pokemon Name', 'Alternate Form Name', 
    'Health Stat', 'Attack Stat', 'Defense Stat', 
    'Special Attack Stat', 'Special Defense Stat', 'Speed Stat',
    'Base Stat Total', 'Primary Ability', 'Secondary Ability', 'Hidden Ability'
]

for pokemon in pokemon_destaque:
    formas = df[df['Original Name'] == pokemon]
    
    if len(formas) > 0:
        print(f"\n{'='*80}")
        print(f"üîç {pokemon}")
        print(f"{'='*80}")
        print(f"Total de formas no dataset: {len(formas)}")
        print(f"\nFormas presentes: {formas['Alternate Form Name'].tolist()}")
        
        print("\nDetalhes completos das formas:")
        print(formas.to_string(index=False))
        
        if len(formas) > 1:
            print(f"\n  üìä An√°lise de diferen√ßas:")
            stats_cols = ['Health Stat', 'Attack Stat', 'Defense Stat', 
                 'Special Attack Stat', 'Special Defense Stat', 'Speed Stat', 'Base Stat Total']
            pokemon_para_verificar = []
            
            for col in stats_cols:
                if formas[col].nunique() > 1:
                    print(f"     ‚Ä¢ {col}: Diferente entre formas")
                    if pokemon not in pokemon_para_verificar:
                        pokemon_para_verificar.append(pokemon)
            
            ability_cols = ['Primary Ability', 'Secondary Ability', 'Hidden Ability']
            for col in ability_cols:
                if formas[col].nunique() > 1:
                    print(f"     ‚Ä¢ {col}: Diferente entre formas")
                    if pokemon not in pokemon_para_verificar:
                        pokemon_para_verificar.append(pokemon)
        else:
            print(f"\n‚ö†Ô∏è  {pokemon}: N√£o encontrado no dataset atual")

print("\n" + "="*80)

AN√ÅLISE DETALHADA DE POK√âMON ESPEC√çFICOS COM M√öLTIPLAS FORMAS

üîç Basculin
Total de formas no dataset: 3

Formas presentes: ['None', 'White-Striped', 'Blue-Striped']

Detalhes completos das formas:
 Pokedex Number             Pokemon Name  Classification Alternate Form Name Legendary Type  Pokemon Height  Pokemon Weight Primary Type Secondary Type Primary Ability Secondary Ability Hidden Ability  Male Ratio  Female Ratio  Health Stat  Attack Stat  Defense Stat  Special Attack Stat  Special Defense Stat  Speed Stat  Base Stat Total  Catch Rate Experience Growth  Experience Growth Total Primary Egg Group Evolution Details Original Name
            550                 Basculin Hostile Pok√©mon                None            NaN             1.0            18.0        Water            NaN        Reckless      Adaptability   Mold Breaker        50.0          50.0           70           92            65                   80                    55          98              460          25 

In [42]:
print("="*80)
print("VERIFICA√á√ÉO P√ìS-CORRE√á√ïES")
print("="*80)

if 'pokemon_destaque' in locals() and len(pokemon_destaque) > 0:
    print(f"\nVerificando {len(pokemon_destaque)} Pok√©mon da lista pokemon_destaque...")
    
    for pokemon in pokemon_destaque:
        formas = df[df['Original Name'] == pokemon]
        if len(formas) > 0:
            print(f"\n{pokemon}:")
            print(f"  Total de formas: {len(formas)}")
            print(f"  Formas: {formas['Alternate Form Name'].tolist()}")
        else:
            print(f"\n{pokemon}: N√£o encontrado no dataset")
else:
    print("\nVari√°vel pokemon_destaque n√£o definida.")

print("\n" + "="*80)

VERIFICA√á√ÉO P√ìS-CORRE√á√ïES

Verificando 15 Pok√©mon da lista pokemon_destaque...

Basculin:
  Total de formas: 3
  Formas: ['None', 'White-Striped', 'Blue-Striped']

Exeggutor:
  Total de formas: 3
  Formas: ['None', 'Alola', 'Alola']

Gourgeist:
  Total de formas: 4
  Formas: ['None', 'Small Size', 'Large Size', 'Super Size']

Lilligant:
  Total de formas: 3
  Formas: ['None', 'Hisui', 'Hisui']

Marowak:
  Total de formas: 3
  Formas: ['None', 'Alola', 'Alola']

Minior:
  Total de formas: 8
  Formas: ['None', 'Red Core', 'Orange Core', 'Yellow Core', 'Green Core', 'Blue Core', 'Indigo Core', 'Violet Core']

Samurott:
  Total de formas: 3
  Formas: ['None', 'Hisui', 'Hisui']

Sliggoo:
  Total de formas: 3
  Formas: ['None', 'Hisui', 'Hisui']

Squawkabilly:
  Total de formas: 3
  Formas: ['None', 'Yellow Plumage', 'White Plumage']

Toxtricity:
  Total de formas: 4
  Formas: ['Amped', 'Low Key', 'Gigantamax', 'Gigantamax']

Vivillon:
  Total de formas: 9
  Formas: ['None', 'River Pat

**Corre√ß√µes espec√≠ficas para formas repetidas desnecess√°rias**

In [43]:
print("="*80)
print("APLICANDO CORRE√á√ïES ESPEC√çFICAS PARA FORMAS REPETIDAS")
print("="*80)

print(f"\nPok√©mon que ser√£o processados: {pokemon_destaque}")
print(f"Total: {len(pokemon_destaque)}")

print("\n" + "="*80)

indices_para_remover = []

# 1. BASCULIN: Manter apenas Basculin Red/Blue Striped e Basculin White Striped
if 'Basculin' in pokemon_destaque:
    print("\n1. Basculin - Removendo Blue Striped...")
    basculin_blue = df[(df['Original Name'] == 'Basculin') & (df['Alternate Form Name'] == 'Blue-Striped')]
    if len(basculin_blue) > 0:
        indices_para_remover.extend(basculin_blue.index.tolist())
        print(f"   ‚úì {len(basculin_blue)} registro(s) de Basculin Blue Striped marcado(s) para remo√ß√£o")
    else:
        print("     Basculin Blue Striped n√£o encontrado")
        
    basculin_none = df[(df['Pokemon Name'].str.contains('Basculin', na=False)) & (df['Alternate Form Name'] == 'None')]
    if len(basculin_none) > 0:
        for idx in basculin_none.index:
            df.loc[idx, 'Alternate Form Name'] = 'Red/Blue Striped'
            if '(Red Striped)' in df.loc[idx, 'Pokemon Name']:
                df.loc[idx, 'Pokemon Name'] = df.loc[idx, 'Pokemon Name'].replace('(Red Striped)', '(Red/Blue Striped)')
        

# 2. MINIOR: Unificar todas as Core forms em uma √∫nica linha e renomear a forma None para Meteor
if 'Minior' in pokemon_destaque:
    print("\n2. Minior - Unificando Core forms...")
    minior_cores = df[(df['Original Name'] == 'Minior') & (df['Alternate Form Name'].str.contains('Core', na=False))]
    if len(minior_cores) > 1:
        # Manter apenas a primeira Core form e renomear para "Core form"
        primeira_core = minior_cores.index[0]
        df.loc[primeira_core, 'Alternate Form Name'] = 'Core form'
        df.loc[primeira_core, 'Pokemon Name'] = 'Minior (Core form)'
        
        # Remover as outras Core forms
        outras_cores = minior_cores.index[1:].tolist()
        indices_para_remover.extend(outras_cores)
        print(f"   ‚úì {len(outras_cores)} Core form(s) adicional(is) marcada(s) para remo√ß√£o")
        print(f"   ‚úì Forma unificada mantida: Core form")
    elif len(minior_cores) == 1:
        # Apenas renomear se houver uma Core form
        df.loc[minior_cores.index[0], 'Alternate Form Name'] = 'Core form'
        df.loc[minior_cores.index[0], 'Pokemon Name'] = 'Minior (Core form)'
        print(f"   ‚úì Core form renomeada")
    else:
        print("     Nenhuma Core form de Minior encontrada")
    
    minior_none = df[(df['Pokemon Name'].str.contains('Minior', na=False, regex=False)) & (df['Alternate Form Name'] == 'None')]

    if len(minior_none) > 0:
        for idx in minior_none.index:
            df.loc[idx, 'Alternate Form Name'] = 'Meteor Form'
            if '(' not in df.loc[idx, 'Pokemon Name']:
                df.loc[idx, 'Pokemon Name'] = f"Minior (Meteor)"

# 3. POK√âMON COM (Alola): Excluir varia√ß√µes sem Evolution Details - apenas da lista pokemon_destaque
print("\n3. Pok√©mon (Alola) - Removendo formas sem Evolution Details...")
pokemon_alola = df[df['Alternate Form Name'] == 'Alola']
for idx, row in pokemon_alola.iterrows():
    if row['Original Name'] in pokemon_destaque:
        if pd.isna(row['Evolution Details']) or row['Evolution Details'] == '':
            indices_para_remover.append(idx)
            print(f"   ‚úì {row['Original Name']} (Alola) sem Evolution Details marcado para remo√ß√£o")

# 4. POK√âMON COM (Hisui): Excluir varia√ß√µes sem Evolution Details - apenas da lista pokemon_destaque
print("\n4. Pok√©mon (Hisui) - Removendo formas sem Evolution Details...")
pokemon_hisui = df[df['Alternate Form Name'] == 'Hisui']
for idx, row in pokemon_hisui.iterrows():
    if row['Original Name'] in pokemon_destaque:
        if pd.isna(row['Evolution Details']) or row['Evolution Details'] == '':
            indices_para_remover.append(idx)
            print(f"   ‚úì {row['Original Name']} (Hisui) sem Evolution Details marcado para remo√ß√£o")

# 5. SQUAWKABILLY: Separar em apenas duas linhas baseado em Hidden Ability
if 'Squawkabilly' in pokemon_destaque:
    print("\n5. Squawkabilly - Reorganizando formas por Hidden Ability...")
    squawkabilly = df[df['Original Name'] == 'Squawkabilly']
    if len(squawkabilly) > 0:
        # Identificar as formas por Hidden Ability
        guts_forms = squawkabilly[squawkabilly['Hidden Ability'] == 'Guts']
        sheer_force_forms = squawkabilly[squawkabilly['Hidden Ability'] == 'Sheer Force']
        
        if len(guts_forms) > 0:
            # Manter apenas a primeira com Guts e renomear
            primeira_guts = guts_forms.index[0]
            df.loc[primeira_guts, 'Alternate Form Name'] = 'Green/Blue Plumage'
            df.loc[primeira_guts, 'Pokemon Name'] = 'Squawkabilly (Green/Blue Plumage)'
            
            # Remover outras com Guts
            if len(guts_forms) > 1:
                outras_guts = guts_forms.index[1:].tolist()
                indices_para_remover.extend(outras_guts)
                print(f"   ‚úì Forma Green/Blue Plumage (Guts) mantida, {len(outras_guts)} duplicata(s) removida(s)")
        
        if len(sheer_force_forms) > 0:
            # Manter apenas a primeira com Sheer Force e renomear
            primeira_sheer = sheer_force_forms.index[0]
            df.loc[primeira_sheer, 'Alternate Form Name'] = 'White/Yellow Plumage'
            df.loc[primeira_sheer, 'Pokemon Name'] = 'Squawkabilly (White/Yellow Plumage)'
            
            # Remover outras com Sheer Force
            if len(sheer_force_forms) > 1:
                outras_sheer = sheer_force_forms.index[1:].tolist()
                indices_para_remover.extend(outras_sheer)
                print(f"   ‚úì Forma White/Yellow Plumage (Sheer Force) mantida, {len(outras_sheer)} duplicata(s) removida(s)")
    else:
        print("     Squawkabilly n√£o encontrado")

# 6. VIVILLON: Excluir todas as formas alternativas, deixando apenas None
if 'Vivillon' in pokemon_destaque:
    print("\n6. Vivillon - Removendo todas as formas alternativas...")
    vivillon_alternativas = df[(df['Original Name'] == 'Vivillon') & (df['Alternate Form Name'] != 'None')]
    if len(vivillon_alternativas) > 0:
        indices_para_remover.extend(vivillon_alternativas.index.tolist())
        print(f"   ‚úì {len(vivillon_alternativas)} forma(s) alternativa(s) de Vivillon marcada(s) para remo√ß√£o")
    else:
        print("     Nenhuma forma alternativa de Vivillon encontrada")

print("\n" + "="*80)
print(f"TOTAL DE REGISTROS MARCADOS PARA REMO√á√ÉO: {len(indices_para_remover)}")
print("="*80)

APLICANDO CORRE√á√ïES ESPEC√çFICAS PARA FORMAS REPETIDAS

Pok√©mon que ser√£o processados: ['Basculin', 'Exeggutor', 'Gourgeist', 'Lilligant', 'Marowak', 'Minior', 'Samurott', 'Sliggoo', 'Squawkabilly', 'Toxtricity', 'Vivillon', 'Raichu', 'Typhlosion', 'Braviary', 'Decidueye']
Total: 15


1. Basculin - Removendo Blue Striped...
   ‚úì 1 registro(s) de Basculin Blue Striped marcado(s) para remo√ß√£o

2. Minior - Unificando Core forms...
   ‚úì 6 Core form(s) adicional(is) marcada(s) para remo√ß√£o
   ‚úì Forma unificada mantida: Core form

3. Pok√©mon (Alola) - Removendo formas sem Evolution Details...
   ‚úì Raichu (Alola) sem Evolution Details marcado para remo√ß√£o
   ‚úì Exeggutor (Alola) sem Evolution Details marcado para remo√ß√£o
   ‚úì Marowak (Alola) sem Evolution Details marcado para remo√ß√£o

4. Pok√©mon (Hisui) - Removendo formas sem Evolution Details...
   ‚úì Typhlosion (Hisui) sem Evolution Details marcado para remo√ß√£o
   ‚úì Samurott (Hisui) sem Evolution Details marc

In [44]:
# Aplicar remo√ß√µes
print(f"\nRegistros antes da remo√ß√£o: {len(df)}")
df = df.drop(index=indices_para_remover)
df = df.reset_index(drop=True)
print(f"Registros ap√≥s remo√ß√£o: {len(df)}")
print(f"Total removido: {len(indices_para_remover)}")

print("\n‚úì Corre√ß√µes aplicadas com sucesso!")
print("="*80)


Registros antes da remo√ß√£o: 1224
Registros ap√≥s remo√ß√£o: 1199
Total removido: 25

‚úì Corre√ß√µes aplicadas com sucesso!


In [45]:
print("="*80)
print("VERIFICA√á√ÉO P√ìS-CORRE√á√ïES")
print("="*80)

if 'pokemon_destaque' in locals() and len(pokemon_destaque) > 0:
    print(f"\nVerificando {len(pokemon_destaque)} Pok√©mon da lista pokemon_destaque...")
    
    for pokemon in pokemon_destaque:
        formas = df[df['Original Name'] == pokemon]
        if len(formas) > 0:
            print(f"\n{pokemon}:")
            print(f"  Total de formas: {len(formas)}")
            print(f"  Formas: {formas['Alternate Form Name'].tolist()}")
        else:
            print(f"\n{pokemon}: N√£o encontrado no dataset")
else:
    print("\nVari√°vel pokemon_destaque n√£o definida.")

print("\n" + "="*80)

VERIFICA√á√ÉO P√ìS-CORRE√á√ïES

Verificando 15 Pok√©mon da lista pokemon_destaque...

Basculin:
  Total de formas: 2
  Formas: ['Red/Blue Striped', 'White-Striped']

Exeggutor:
  Total de formas: 2
  Formas: ['None', 'Alola']

Gourgeist:
  Total de formas: 4
  Formas: ['None', 'Small Size', 'Large Size', 'Super Size']

Lilligant:
  Total de formas: 2
  Formas: ['None', 'Hisui']

Marowak:
  Total de formas: 2
  Formas: ['None', 'Alola']

Minior:
  Total de formas: 2
  Formas: ['Meteor Form', 'Core form']

Samurott:
  Total de formas: 2
  Formas: ['None', 'Hisui']

Sliggoo:
  Total de formas: 2
  Formas: ['None', 'Hisui']

Squawkabilly:
  Total de formas: 2
  Formas: ['Green/Blue Plumage', 'White/Yellow Plumage']

Toxtricity:
  Total de formas: 4
  Formas: ['Amped', 'Low Key', 'Gigantamax', 'Gigantamax']

Vivillon:
  Total de formas: 1
  Formas: ['None']

Raichu:
  Total de formas: 2
  Formas: ['None', 'Alola']

Typhlosion:
  Total de formas: 2
  Formas: ['None', 'Hisui']

Braviary:
  To

---

## 3.4 Preencher valores nulos na coluna "Evolution Details"

In [47]:
df['Evolution Details'].unique()

array([nan, 'Level 16', 'Level 32', 'Level 36', 'Level 7', 'Level 10',
       'Level 18', 'Level 20', 'Level 22', 'Thunder Stone',
       'Thunder Stone in Alola', 'Ice Stone', 'Moon Stone', 'Fire Stone',
       'Level 21', 'Leaf Stone', 'Level 24', 'Level 31',
       'Level 26 In Alola', 'Level 26', 'Level 28', 'Level 33',
       'Level 25', 'Water Stone', 'Level 30', 'Level 40', 'Galarica Cuff',
       'Level 37', 'Level 34', 'Level 38', 'Leaf Stone in Alola',
       'Level 28 At Night in Alola', 'Level 20 With Attack > Defense',
       'Level 20 With Attack < Defense', 'Level 35', 'Level 35 in Galar',
       'Level 42', 'Holding An Oval Stone During The Day',
       'Knowing Mimic', 'Knowing Mimic in Galar', 'Level 55', 'Level 14',
       'Level 36 in Hisui', 'Level 15', 'Level 27', 'Sun Stone',
       "King's Rock", 'During The Day', 'During The Night',
       'Galarica Wreath', 'Metal Coat', 'Level 23',
       'With Remoraid In The Party', 'Dragon Scale', 'Upgrade',
       'Level 

In [48]:
df['Evolution Details'] = df['Evolution Details'].fillna('No Evolution')

print("‚úì Valores nulos na coluna 'Evolution Details' preenchidos com 'No Evolution'!")
print(f"\nValores nulos restantes: {df['Evolution Details'].isnull().sum()}")
print(f"\nTotal de registros com 'No Evolution': {(df['Evolution Details'] == 'No Evolution').sum()}")

‚úì Valores nulos na coluna 'Evolution Details' preenchidos com 'No Evolution'!

Valores nulos restantes: 0

Total de registros com 'No Evolution': 703


---

## 3.5 Preencher valores nulos na coluna "Secondary Ability"

In [49]:
df['Secondary Ability'] = df['Secondary Ability'].fillna('None')

print("‚úì Valores nulos na coluna 'Secondary Ability' preenchidos com 'None'!")
print(f"\nValores nulos restantes: {df['Secondary Ability'].isnull().sum()}")
print(f"\nTotal de registros com 'None': {(df['Secondary Ability'] == 'None').sum()}")

‚úì Valores nulos na coluna 'Secondary Ability' preenchidos com 'None'!

Valores nulos restantes: 0

Total de registros com 'None': 620


---

## 3.6 Preencher valores nulos na coluna "Hidden Ability"

In [50]:
df['Hidden Ability'] = df['Hidden Ability'].fillna('None')

print("‚úì Valores nulos na coluna 'Hidden Ability' preenchidos com 'None'!")
print(f"\nValores nulos restantes: {df['Hidden Ability'].isnull().sum()}")
print(f"\nTotal de registros com 'None': {(df['Hidden Ability'] == 'None').sum()}")

‚úì Valores nulos na coluna 'Hidden Ability' preenchidos com 'None'!

Valores nulos restantes: 0

Total de registros com 'None': 278


---

## 3.7 Preencher valores nulos na coluna "Secondary Type"

In [51]:
df['Secondary Type'] = df['Secondary Type'].fillna('None')

print("‚úì Valores nulos na coluna 'Secondary Type' preenchidos com 'None'!")
print(f"\nValores nulos restantes: {df['Secondary Type'].isnull().sum()}")
print(f"\nTotal de registros com 'None': {(df['Secondary Type'] == 'None').sum()}")

‚úì Valores nulos na coluna 'Secondary Type' preenchidos com 'None'!

Valores nulos restantes: 0

Total de registros com 'None': 537


---

## 3.8 Preencher valores nulos na coluna "Legendary Type"

In [52]:
print("="*80)
print("COLUNAS COM VALORES NULOS RESTANTES")
print("="*80)

# Contagem de valores nulos por coluna
null_counts = df.isnull().sum()
null_percentages = (df.isnull().sum() / len(df) * 100).round(2)

# Criar DataFrame com informa√ß√µes sobre valores nulos
null_info = pd.DataFrame({
    'Coluna': null_counts.index,
    'Valores Nulos': null_counts.values,
    'Percentual (%)': null_percentages.values
})

null_info_filtered = null_info[null_info['Valores Nulos'] > 0].sort_values('Valores Nulos', ascending=False)

print(f"\nTotal de registros no dataset: {len(df)}")
print(f"\nColunas com valores nulos: {len(null_info_filtered)}")

if len(null_info_filtered) > 0:
    print("\n" + "="*80)
    print("DETALHAMENTO DAS COLUNAS COM VALORES NULOS:")
    print("="*80)
    print(null_info_filtered.to_string(index=False))
else:
    print("\n‚úì NENHUMA COLUNA POSSUI VALORES NULOS!")

print("\n" + "="*80)
print(f"Total de valores nulos no dataset: {df.isnull().sum().sum()}")
print(f"Percentual total de valores nulos: {(df.isnull().sum().sum() / (df.shape[0] * df.shape[1]) * 100):.2f}%")
print("="*80)

COLUNAS COM VALORES NULOS RESTANTES

Total de registros no dataset: 1199

Colunas com valores nulos: 1

DETALHAMENTO DAS COLUNAS COM VALORES NULOS:
        Coluna  Valores Nulos  Percentual (%)
Legendary Type           1051           87.66

Total de valores nulos no dataset: 1051
Percentual total de valores nulos: 3.25%


**Padroniza√ß√£o da coluna 'Legendary Type'**

In [53]:
# Remover prefixo "Sub-" e simplificar para apenas "Mythical" e "Legendary"
print("\nValores √∫nicos antes da padroniza√ß√£o:")
print(df['Legendary Type'].value_counts())

# Simplificar: manter apenas "Mythical" e "Legendary"
# Converter qualquer varia√ß√£o para as duas categorias principais
df.loc[df['Legendary Type'].notna(), 'Legendary Type'] = df.loc[df['Legendary Type'].notna(), 'Legendary Type'].apply(
    lambda x: 'Mythical' if 'Mythical' in x else ('Legendary' if 'Legendary' in x else x)
)

print("\n" + "="*80)
print("Valores √∫nicos ap√≥s a padroniza√ß√£o:")
print(df['Legendary Type'].value_counts())

print("\n‚úì Coluna 'Legendary Type' padronizada com sucesso!")
print("="*80)


Valores √∫nicos antes da padroniza√ß√£o:

Legendary Type
Sub-Legendary    68
Legendary        49
Mythical         31
Name: count, dtype: int64

Valores √∫nicos ap√≥s a padroniza√ß√£o:
Legendary Type
Legendary    117
Mythical      31
Name: count, dtype: int64

‚úì Coluna 'Legendary Type' padronizada com sucesso!
Legendary Type
Sub-Legendary    68
Legendary        49
Mythical         31
Name: count, dtype: int64

Valores √∫nicos ap√≥s a padroniza√ß√£o:
Legendary Type
Legendary    117
Mythical      31
Name: count, dtype: int64

‚úì Coluna 'Legendary Type' padronizada com sucesso!


**Novas Categorias em Legendary Type**

In [54]:
# Classificar Pok√©mon como "Pseudo-Legendary" baseado nas condi√ß√µes espec√≠ficas
print("="*80)
print("CLASSIFICA√á√ÉO DE PSEUDO-LEGENDARY")
print("="*80)

# Condi√ß√£o 1: Base Stat Total = 600, Experience Growth = "Slow", Experience Growth Total = 1,250,000
pseudo_legendary_condition = (
    (df['Base Stat Total'] == 600) & 
    (df['Experience Growth'] == 'Slow') & 
    (df['Experience Growth Total'] == 1250000) &
    (df['Alternate Form Name'] == 'None') &
    (df['Legendary Type'] != 'Legendary') &
    (df['Legendary Type'] != 'Mythical') 
)

pseudo_legendary_pokemon = df[pseudo_legendary_condition]

print(f"\nPok√©mon identificados como Pseudo-Legendary pelas condi√ß√µes base:")
print(f"Total: {len(pseudo_legendary_pokemon)}")

if len(pseudo_legendary_pokemon) > 0:
    print("\nLista de Pok√©mon:")
    print(pseudo_legendary_pokemon[['Pokemon Name', 'Base Stat Total', 'Experience Growth', 'Experience Growth Total', 'Pokedex Number']].to_string(index=False))
    
    # Aplicar classifica√ß√£o aos Pok√©mon que atendem √†s condi√ß√µes
    df.loc[pseudo_legendary_condition, 'Legendary Type'] = 'Pseudo-Legendary'
    print(f"\n‚úì {len(pseudo_legendary_pokemon)} Pok√©mon classificados como Pseudo-Legendary")
    
    # Condi√ß√£o 2: Classificar tamb√©m a linha evolutiva completa
    print("\n" + "="*80)
    print("CLASSIFICANDO LINHA EVOLUTIVA DOS PSEUDO-LEGENDARY")
    print("="*80)
    
    # Obter os Pokedex Numbers dos Pseudo-Legendary identificados
    pseudo_legendary_ids = set(pseudo_legendary_pokemon['Pokedex Number'].unique())
    total_evolutivos = 0
    
    # Para cada Pseudo-Legendary, buscar suas pr√©-evolu√ß√µes subtraindo 1 e 2 do Pokedex Number
    all_evolutionary_ids = set(pseudo_legendary_ids)
    
    for pseudo_id in pseudo_legendary_ids:
        print(f"\nAnalisando linha evolutiva do Pseudo-Legendary Pokedex #{pseudo_id}:")
        
        # Buscar o nome do Pseudo-Legendary
        pseudo_name = df[df['Pokedex Number'] == pseudo_id]['Pokemon Name'].iloc[0]
        print(f"  Pok√©mon final: {pseudo_name}")
        
        # Procurar pr√©-evolu√ß√µes: Pokedex Number - 1 e Pokedex Number - 2
        evolutionary_line = []
        
        # Verificar Pokedex Number - 2 (primeira pr√©-evolu√ß√£o)
        pre_evo_2 = pseudo_id - 2
        pre_evo_2_pokemon = df[(df['Pokedex Number'] == pre_evo_2) & (df['Alternate Form Name'] == 'None')]
        
        if len(pre_evo_2_pokemon) > 0:
            pre_evo_2_name = pre_evo_2_pokemon.iloc[0]['Pokemon Name']
            pre_evo_2_bst = pre_evo_2_pokemon.iloc[0]['Base Stat Total']
            evolutionary_line.append((pre_evo_2, pre_evo_2_name, pre_evo_2_bst))
            all_evolutionary_ids.add(pre_evo_2)
            print(f"  ‚Üí Pr√©-evolu√ß√£o -2: {pre_evo_2_name} (#{pre_evo_2}, BST: {pre_evo_2_bst})")
        
        # Verificar Pokedex Number - 1 (segunda pr√©-evolu√ß√£o)
        pre_evo_1 = pseudo_id - 1
        pre_evo_1_pokemon = df[(df['Pokedex Number'] == pre_evo_1) & (df['Alternate Form Name'] == 'None')]
        
        if len(pre_evo_1_pokemon) > 0:
            pre_evo_1_name = pre_evo_1_pokemon.iloc[0]['Pokemon Name']
            pre_evo_1_bst = pre_evo_1_pokemon.iloc[0]['Base Stat Total']
            evolutionary_line.append((pre_evo_1, pre_evo_1_name, pre_evo_1_bst))
            all_evolutionary_ids.add(pre_evo_1)
            print(f"  ‚Üí Pr√©-evolu√ß√£o -1: {pre_evo_1_name} (#{pre_evo_1}, BST: {pre_evo_1_bst})")
        
        # Mostrar a linha evolutiva completa
        if len(evolutionary_line) > 0:
            print(f"  Linha evolutiva completa: ", end="")
            for i, (pid, pname, pbst) in enumerate(evolutionary_line):
                print(f"{pname} (#{pid})", end="")
                if i < len(evolutionary_line) - 1:
                    print(" ‚Üí ", end="")
                else:
                    print(f" ‚Üí {pseudo_name} (#{pseudo_id})")
        else:
            print(f"  ‚ö†Ô∏è Nenhuma pr√©-evolu√ß√£o encontrada nos Pokedex Numbers #{pre_evo_2} e #{pre_evo_1}")
    
    print("\n" + "="*80)
    print("APLICANDO CLASSIFICA√á√ÉO")
    print("="*80)
    
    for pokedex_id in all_evolutionary_ids:
        # Buscar todos os Pok√©mon com este Pokedex Number
        pokemon_rows = df[df['Pokedex Number'] == pokedex_id]
        
        for idx in pokemon_rows.index:
            current_type = df.loc[idx, 'Legendary Type']
            
            # Classificar se ainda n√£o tiver classifica√ß√£o ou n√£o for Legendary/Mythical
            if pd.isna(current_type) or current_type not in ['Legendary', 'Mythical', 'Pseudo-Legendary']:
                df.loc[idx, 'Legendary Type'] = 'Pseudo-Legendary'
                if pokedex_id not in pseudo_legendary_ids:
                    print(f"  ‚úì {df.loc[idx, 'Pokemon Name']} (#{pokedex_id}) classificado como Pseudo-Legendary")
                    total_evolutivos += 1
    
    print(f"\n‚úì {total_evolutivos} Pok√©mon adicionais classificados como Pseudo-Legendary (linha evolutiva)")
    
    print("\n" + "="*80)
    print("RESULTADO FINAL - PSEUDO-LEGENDARY")
    print("="*80)
    
    all_pseudo_legendary = df[df['Legendary Type'] == 'Pseudo-Legendary'].sort_values('Pokedex Number')
    print(f"\nTotal de Pok√©mon classificados como Pseudo-Legendary: {len(all_pseudo_legendary)}")
    print("\nLista completa:")
    print(all_pseudo_legendary[['Pokemon Name', 'Pokedex Number', 'Base Stat Total', 'Experience Growth']].to_string(index=False))
else:
    print("\n‚ö†Ô∏è Nenhum Pok√©mon encontrado com as condi√ß√µes especificadas")

print("\n" + "="*80)

CLASSIFICA√á√ÉO DE PSEUDO-LEGENDARY

Pok√©mon identificados como Pseudo-Legendary pelas condi√ß√µes base:
Total: 10

Lista de Pok√©mon:
Pokemon Name  Base Stat Total Experience Growth  Experience Growth Total  Pokedex Number
   Dragonite              600              Slow                  1250000             149
   Tyranitar              600              Slow                  1250000             248
   Salamence              600              Slow                  1250000             373
   Metagross              600              Slow                  1250000             376
    Garchomp              600              Slow                  1250000             445
   Hydreigon              600              Slow                  1250000             635
      Goodra              600              Slow                  1250000             706
     Kommo-o              600              Slow                  1250000             784
   Dragapult              600              Slow                

In [55]:
# Verificar linhas evolutivas completas dos Pseudo-Legendary
print("="*80)
print("LINHAS EVOLUTIVAS COMPLETAS - PSEUDO-LEGENDARY")
print("="*80)

pseudo_df = df[df['Legendary Type'] == 'Pseudo-Legendary'].copy()

if len(pseudo_df) > 0:
    # Identificar os Pok√©mon finais (aqueles com BST = 600)
    # Os finais s√£o os que atendem todos os crit√©rios originais
    final_stage_condition = (
        (pseudo_df['Base Stat Total'] == 600) & 
        (pseudo_df['Experience Growth'] == 'Slow') & 
        (pseudo_df['Experience Growth Total'] == 1250000) &
        (pseudo_df['Alternate Form Name'] == 'None')
    )
    
    final_stage_ids = pseudo_df[final_stage_condition]['Pokedex Number'].unique()
    
    print(f"\nTotal de linhas evolutivas: {len(final_stage_ids)}")
    
    # Mostrar cada linha evolutiva completa
    for final_id in sorted(final_stage_ids):
        print("\n" + "-"*80)
        
        # Construir a linha evolutiva completa usando aritm√©tica de Pokedex Number
        evolutionary_chain = []
        
        # Pok√©mon final
        final_pokemon = pseudo_df[pseudo_df['Pokedex Number'] == final_id].iloc[0]
        evolutionary_chain.append({
            'name': final_pokemon['Pokemon Name'],
            'pokedex': final_pokemon['Pokedex Number'],
            'bst': final_pokemon['Base Stat Total']
        })
        
        # Verificar pr√©-evolu√ß√£o -1 (intermedi√°ria)
        pre_evo_1_id = final_id - 1
        pre_evo_1_data = pseudo_df[(pseudo_df['Pokedex Number'] == pre_evo_1_id) & (pseudo_df['Alternate Form Name'] == 'None')]
        
        if len(pre_evo_1_data) > 0:
            pre_evo_1 = pre_evo_1_data.iloc[0]
            evolutionary_chain.insert(0, {
                'name': pre_evo_1['Pokemon Name'],
                'pokedex': pre_evo_1['Pokedex Number'],
                'bst': pre_evo_1['Base Stat Total']
            })
        
        # Verificar pr√©-evolu√ß√£o -2 (inicial)
        pre_evo_2_id = final_id - 2
        pre_evo_2_data = pseudo_df[(pseudo_df['Pokedex Number'] == pre_evo_2_id) & (pseudo_df['Alternate Form Name'] == 'None')]
        
        if len(pre_evo_2_data) > 0:
            pre_evo_2 = pre_evo_2_data.iloc[0]
            evolutionary_chain.insert(0, {
                'name': pre_evo_2['Pokemon Name'],
                'pokedex': pre_evo_2['Pokedex Number'],
                'bst': pre_evo_2['Base Stat Total']
            })
        
        # Mostrar a cadeia evolutiva
        print(f"Linha evolutiva ({len(evolutionary_chain)} est√°gios):")
        for i, stage in enumerate(evolutionary_chain, 1):
            arrow = " ‚Üí " if i < len(evolutionary_chain) else ""
            print(f"  {i}. {stage['name']} (#{stage['pokedex']}, BST: {stage['bst']}){arrow}", end="")
        print()

print("\n" + "="*80)

LINHAS EVOLUTIVAS COMPLETAS - PSEUDO-LEGENDARY

Total de linhas evolutivas: 10

--------------------------------------------------------------------------------
Linha evolutiva (3 est√°gios):
  1. Dratini (#147, BST: 300) ‚Üí   2. Dragonair (#148, BST: 420) ‚Üí   3. Dragonite (#149, BST: 600)

--------------------------------------------------------------------------------
Linha evolutiva (3 est√°gios):
  1. Larvitar (#246, BST: 300) ‚Üí   2. Pupitar (#247, BST: 410) ‚Üí   3. Tyranitar (#248, BST: 600)

--------------------------------------------------------------------------------
Linha evolutiva (3 est√°gios):
  1. Bagon (#371, BST: 300) ‚Üí   2. Shelgon (#372, BST: 420) ‚Üí   3. Salamence (#373, BST: 600)

--------------------------------------------------------------------------------
Linha evolutiva (3 est√°gios):
  1. Beldum (#374, BST: 300) ‚Üí   2. Metang (#375, BST: 420) ‚Üí   3. Metagross (#376, BST: 600)

---------------------------------------------------------------------

In [56]:
# Verificar resultado da padroniza√ß√£o
print("="*80)
print("VERIFICA√á√ÉO AP√ìS PADRONIZA√á√ÉO")
print("="*80)

legendary_pokemon_updated = df[df['Legendary Type'].notna()]

print(f"\nTotal de Pok√©mon com Legendary Type: {len(legendary_pokemon_updated)}")
print(f"\nDistribui√ß√£o por tipo:")
print(legendary_pokemon_updated['Legendary Type'].value_counts())

print("\n" + "="*80)
print("Amostra de Pok√©mon por categoria:")
print("="*80)

for tipo in df['Legendary Type'].dropna().unique():
    print(f"\n{tipo}:")
    amostra = df[df['Legendary Type'] == tipo][['Pokemon Name', 'Legendary Type']].head(5)
    print(amostra.to_string(index=False))
    
print("\n" + "="*80)

VERIFICA√á√ÉO AP√ìS PADRONIZA√á√ÉO

Total de Pok√©mon com Legendary Type: 184

Distribui√ß√£o por tipo:
Legendary Type
Legendary           117
Pseudo-Legendary     36
Mythical             31
Name: count, dtype: int64

Amostra de Pok√©mon por categoria:

Legendary:
    Pokemon Name Legendary Type
        Articuno      Legendary
Articuno (Galar)      Legendary
          Zapdos      Legendary
  Zapdos (Galar)      Legendary
         Moltres      Legendary

Pseudo-Legendary:
Pokemon Name   Legendary Type
     Dratini Pseudo-Legendary
   Dragonair Pseudo-Legendary
   Dragonite Pseudo-Legendary
    Larvitar Pseudo-Legendary
     Pupitar Pseudo-Legendary

Mythical:
   Pokemon Name Legendary Type
            Mew       Mythical
         Celebi       Mythical
        Jirachi       Mythical
         Deoxys       Mythical
Deoxys (Attack)       Mythical



In [57]:
# Verificar quais Pok√©mon t√™m a coluna 'Legendary Type' preenchida
legendary_pokemon = df[df['Legendary Type'].notna()]

print("="*80)
print("POK√âMON COM 'LEGENDARY TYPE' PREENCHIDO")
print("="*80)

print(f"\nTotal de Pok√©mon lend√°rios/especiais: {len(legendary_pokemon)}")
print(f"\nTipos de lend√°rios no dataset:")
print(legendary_pokemon['Legendary Type'].value_counts())

print("\n" + "="*80)
print("LISTA COMPLETA DE POK√âMON COM LEGENDARY TYPE:")
print("="*80)
print(legendary_pokemon[['Pokemon Name', 'Legendary Type', 'Alternate Form Name']].to_string(index=False))

POK√âMON COM 'LEGENDARY TYPE' PREENCHIDO

Total de Pok√©mon lend√°rios/especiais: 184

Tipos de lend√°rios no dataset:
Legendary Type
Legendary           117
Pseudo-Legendary     36
Mythical             31
Name: count, dtype: int64

LISTA COMPLETA DE POK√âMON COM LEGENDARY TYPE:
                      Pokemon Name   Legendary Type  Alternate Form Name
                          Articuno        Legendary                 None
                  Articuno (Galar)        Legendary                Galar
                            Zapdos        Legendary                 None
                    Zapdos (Galar)        Legendary                Galar
                           Moltres        Legendary                 None
                   Moltres (Galar)        Legendary                Galar
                           Dratini Pseudo-Legendary                 None
                         Dragonair Pseudo-Legendary                 None
                         Dragonite Pseudo-Legendary            

In [58]:
# Definir os ranges de Pokedex Number para cada grupo
starter_ranges = [
    (1, 9),
    (152, 160),
    (252, 260),
    (387, 395),
    (495, 503),
    (650, 658),
    (722, 730),
    (810, 818),
    (906, 914)
]

paradox_ranges = [
    (984, 995),
    (1005, 1006),
    (1009, 1010),
    (1020, 1023)
]

ultra_beast_ranges = [
    (793, 799),
    (803, 806)
]

# Fun√ß√£o auxiliar para verificar se um Pokedex Number est√° em algum range
def in_ranges(pokedex_num, ranges):
    for start, end in ranges:
        if start <= pokedex_num <= end:
            return True
    return False

# Contadores
starter_count = 0
paradox_count = 0
ultra_beast_count = 0

print("\nClassificando Pok√©mon por grupo...")
print("="*80)

# Classificar Starters
print("\n1. STARTER:")
for start, end in starter_ranges:
    mask = (df['Pokedex Number'] >= start) & (df['Pokedex Number'] <= end) & (df['Alternate Form Name'] == 'None')
    
    if mask.sum() > 0:
        df.loc[mask, 'Legendary Type'] = 'Starter'
        starter_count += mask.sum()
        pokemon_list = df[mask][['Pokemon Name', 'Pokedex Number']].values
        print(f"  Pokedex #{start:04d}-{end:04d}: {mask.sum()} Pok√©mon classificados")
        for name, pid in pokemon_list:
            print(f"    ‚úì {name} (#{pid})")

# Adicionar Starters especiais (Partner e Hisui)
print("\n  Starters Especiais (Partner e Hisui):")
special_starters = [
    ('Pikachu', 'Starter'),
    ('Eevee', 'Starter'),
    ('Typhlosion', 'Hisui'),
    ('Samurott', 'Hisui'),
    ('Decidueye', 'Hisui')
]

for poke_name, form_name in special_starters:
    # Usar str.contains para buscar parcialmente o nome (ignora aspas e espa√ßos)
    mask = (df['Pokemon Name'].str.contains(poke_name, na=False)) & (df['Alternate Form Name'].str.contains(form_name, na=False))
    
    if mask.sum() > 0:
        df.loc[mask, 'Legendary Type'] = 'Starter'
        starter_count += mask.sum()
        pokemon_data = df[mask][['Pokemon Name', 'Pokedex Number']].values
        for name, pid in pokemon_data:
            print(f"    ‚úì {name} (#{pid})")
    else:
        print(f"    ‚ö†Ô∏è {poke_name} n√£o encontrado!")

print(f"\n  Total de Starters classificados: {starter_count}")

# Classificar Paradox
print("\n2. PARADOX:")
for start, end in paradox_ranges:
    mask = (df['Pokedex Number'] >= start) & (df['Pokedex Number'] <= end) & (df['Alternate Form Name'] == 'None')
    
    if mask.sum() > 0:
        df.loc[mask, 'Legendary Type'] = 'Paradox'
        paradox_count += mask.sum()
        pokemon_list = df[mask][['Pokemon Name', 'Pokedex Number']].values
        print(f"  Pokedex #{start:04d}-{end:04d}: {mask.sum()} Pok√©mon classificados")
        for name, pid in pokemon_list:
            print(f"    ‚úì {name} (#{pid})")

print(f"\n  Total de Paradox classificados: {paradox_count}")

# Classificar Ultra Beasts
print("\n3. ULTRA BEAST:")
for start, end in ultra_beast_ranges:
    mask = (df['Pokedex Number'] >= start) & (df['Pokedex Number'] <= end) & (df['Alternate Form Name'] == 'None')

    if mask.sum() > 0:
        df.loc[mask, 'Legendary Type'] = 'Ultra Beast'
        ultra_beast_count += mask.sum()
        pokemon_list = df[mask][['Pokemon Name', 'Pokedex Number']].values
        print(f"  Pokedex #{start:04d}-{end:04d}: {mask.sum()} Pok√©mon classificados")
        for name, pid in pokemon_list:
            print(f"    ‚úì {name} (#{pid})")

print(f"\n  Total de Ultra Beasts classificados: {ultra_beast_count}")

# Resumo final
print("\n" + "="*80)
print("RESUMO DA CLASSIFICA√á√ÉO")
print("="*80)
print(f"\nTotal de Pok√©mon classificados:")
print(f"  ‚Ä¢ Starter: {starter_count}")
print(f"  ‚Ä¢ Paradox: {paradox_count}")
print(f"  ‚Ä¢ Ultra Beast: {ultra_beast_count}")
print(f"  ‚Ä¢ Total geral: {starter_count + paradox_count + ultra_beast_count}")

print("\n" + "="*80)
print("‚úì Classifica√ß√£o de grupos adicionais conclu√≠da!")
print("="*80)


Classificando Pok√©mon por grupo...

1. STARTER:
  Pokedex #0001-0009: 9 Pok√©mon classificados
    ‚úì Bulbasaur (#1)
    ‚úì Ivysaur (#2)
    ‚úì Venusaur (#3)
    ‚úì Charmander (#4)
    ‚úì Charmeleon (#5)
    ‚úì Charizard (#6)
    ‚úì Squirtle (#7)
    ‚úì Wartortle (#8)
    ‚úì Blastoise (#9)
  Pokedex #0152-0160: 9 Pok√©mon classificados
    ‚úì Chikorita (#152)
    ‚úì Bayleef (#153)
    ‚úì Meganium (#154)
    ‚úì Cyndaquil (#155)
    ‚úì Quilava (#156)
    ‚úì Typhlosion (#157)
    ‚úì Totodile (#158)
    ‚úì Croconaw (#159)
    ‚úì Feraligatr (#160)
  Pokedex #0252-0260: 9 Pok√©mon classificados
    ‚úì Treecko (#252)
    ‚úì Grovyle (#253)
    ‚úì Sceptile (#254)
    ‚úì Torchic (#255)
    ‚úì Combusken (#256)
    ‚úì Blaziken (#257)
    ‚úì Mudkip (#258)
    ‚úì Marshtomp (#259)
    ‚úì Swampert (#260)
  Pokedex #0387-0395: 9 Pok√©mon classificados
    ‚úì Turtwig (#387)
    ‚úì Grotle (#388)
    ‚úì Torterra (#389)
    ‚úì Chimchar (#390)
    ‚úì Monferno (#391)
    ‚úì

In [59]:
# Verificar a distribui√ß√£o de todos os grupos legend√°rios
print("="*80)
print("DISTRIBUI√á√ÉO COMPLETA DOS GRUPOS")
print("="*80)

print(f"\nTotal de Pok√©mon com Group preenchido: {df['Legendary Type'].notna().sum()}")
print(f"\nDistribui√ß√£o por grupo:")
print(df['Legendary Type'].value_counts().sort_index())

print("\n" + "="*80)
print("AMOSTRAS DE CADA GRUPO")
print("="*80)

for grupo in df['Legendary Type'].dropna().unique():
    print(f"\n{grupo}:")
    amostra = df[df['Legendary Type'] == grupo][['Pokemon Name', 'Pokedex Number', 'Legendary Type']].sort_values('Pokedex Number').head(10)
    print(amostra.to_string(index=False))

print("\n" + "="*80)

DISTRIBUI√á√ÉO COMPLETA DOS GRUPOS

Total de Pok√©mon com Group preenchido: 290

Distribui√ß√£o por grupo:
Legendary Type
Legendary           106
Mythical             31
Paradox              20
Pseudo-Legendary     36
Starter              86
Ultra Beast          11
Name: count, dtype: int64

AMOSTRAS DE CADA GRUPO

Starter:
     Pokemon Name  Pokedex Number Legendary Type
        Bulbasaur               1        Starter
          Ivysaur               2        Starter
         Venusaur               3        Starter
       Charmander               4        Starter
       Charmeleon               5        Starter
        Charizard               6        Starter
         Squirtle               7        Starter
        Wartortle               8        Starter
        Blastoise               9        Starter
Pikachu (Starter)              25        Starter

Legendary:
    Pokemon Name  Pokedex Number Legendary Type
        Articuno             144      Legendary
Articuno (Galar)           

---

## 3.9 Padroniza√ß√£o dos Nomes das Colunas

In [None]:
print("="*80)
print("PADRONIZA√á√ÉO DE NOMES DE COLUNAS")
print("="*80)

# Primeiro renomeamento: nomes mais limpos
df = df.rename(columns={
    'Pokedex Number': 'Pokedex Id',
    'Pokemon Name': 'Name',
    'Legendary Type': 'Group',
    'Pokemon Height': 'Height',
    'Pokemon Weight': 'Weight',
    'Health Stat': 'HP',
    'Attack Stat': 'Attack',
    'Defense Stat': 'Defense',
    'Special Attack Stat': 'SpAtk',
    'Special Defense Stat': 'SpDef',
    'Speed Stat': 'Speed',
    'Base Stat Total': 'BST',
    'Catch Rate': 'CatchRate',
    'Experience Growth': 'Experience Growth Rate',
    'Experience Growth Total': 'Experience Growth Points Total',
    'Primary Egg Group': 'Egg Group'
})

print("‚úì Nomes das colunas padronizados!")
print(f"\nüìã Nova lista de colunas ({len(df.columns)} colunas):")
for i, col in enumerate(df.columns, 1):
    print(f"   {i:2}. {col}")

PADRONIZA√á√ÉO DE NOMES DE COLUNAS
‚úì Nomes das colunas padronizados!

üìã Nova lista de colunas (27 colunas):
    1. Pokedex Id
    2. Name
    3. Classification
    4. Alternate Form Name
    5. Group
    6. Height
    7. Weight
    8. Primary Type
    9. Secondary Type
   10. Primary Ability
   11. Secondary Ability
   12. Hidden Ability
   13. Male Ratio
   14. Female Ratio
   15. HP
   16. Attack
   17. Defense
   18. SpAtk
   19. SpDef
   20. Speed
   21. BST
   22. CatchRate
   23. Experience Growth Rate
   24. Experience Growth Points Total
   25. Egg Group
   26. Evolution Details
   27. Original Name


---

## 4. Exporta√ß√£o do Dataset Limpo

Ap√≥s todas as transforma√ß√µes, exportamos o dataset limpo para uso em an√°lises e modelagem.

In [63]:
print("="*80)
print("EXPORTA√á√ÉO DO DATASET LIMPO")
print("="*80)

# Exportar dataset limpo
output_filename = 'pokemon_dataset_cleaned.csv'
df.to_csv(output_filename, index=False)

print(f"\n‚úÖ Dataset limpo exportado com sucesso!")
print(f"   üìÅ Arquivo: {output_filename}")
print(f"   üìä Dimens√µes finais: {df.shape[0]} linhas x {df.shape[1]} colunas")
print(f"   üíæ Tamanho do arquivo: {round(df.memory_usage(deep=True).sum() / 1024**2, 2)} MB")

print("\n" + "="*80)
print("RESUMO DA LIMPEZA")
print("="*80)
print("‚úì Aspas removidas de colunas textuais")
print("‚úì Nomes de colunas padronizados")
print("‚úì Valores nulos identificados e tratados")
print("‚úì Dataset pronto para an√°lise explorat√≥ria e modelagem!")
print("\nüìå Pr√≥ximo passo: Execute 'an√°lise pokemon.ipynb' para EDA")

EXPORTA√á√ÉO DO DATASET LIMPO

‚úÖ Dataset limpo exportado com sucesso!
   üìÅ Arquivo: pokemon_dataset_cleaned.csv
   üìä Dimens√µes finais: 1199 linhas x 27 colunas
   üíæ Tamanho do arquivo: 1.1 MB

RESUMO DA LIMPEZA
‚úì Aspas removidas de colunas textuais
‚úì Nomes de colunas padronizados
‚úì Valores nulos identificados e tratados
‚úì Dataset pronto para an√°lise explorat√≥ria e modelagem!

üìå Pr√≥ximo passo: Execute 'an√°lise pokemon.ipynb' para EDA


---

## üìù Resumo da Limpeza de Dados

### Transforma√ß√µes Aplicadas

1. **‚úÖ Remo√ß√£o de Aspas Extras**
   - Todas as colunas textuais foram processadas
   - Caracteres especiais (") removidos

2. **‚úÖ Padroniza√ß√£o de Nomes de Colunas**
   - Nomes simplificados e consistentes
   - Abrevia√ß√µes padr√£o aplicadas (HP, SpAtk, SpDef, BST, CatchRate)
   - Total de 27 colunas padronizadas

3. **‚úÖ Tratamento de Valores Nulos**
   - An√°lise detalhada de nulos por coluna
   - Decis√µes tomadas com base no contexto
   - Colunas com muitos nulos documentadas

4. **‚úÖ Verifica√ß√£o de Qualidade**
   - Sem duplicados encontrados
   - Tipos de dados validados
   - Estat√≠sticas descritivas geradas

---

### Dataset Limpo: Caracter√≠sticas Finais

- **Arquivo gerado**: `pokemon_dataset_cleaned.csv`
- **Total de Pok√©mon**: ~1.200 registros
- **Total de colunas**: 27
- **Colunas principais**:
  - Identifica√ß√£o: `Pokedex Id`, `Name`, `Classification`
  - Atributos f√≠sicos: `Height`, `Weight`
  - Stats de combate: `HP`, `Attack`, `Defense`, `SpAtk`, `SpDef`, `Speed`, `BST`
  - Raridade: `Group` (Legendary, Starter, Mythical, etc.)
  - Captura: `CatchRate`
  - Tipos: `Primary Type`, `Secondary Type`
  - Habilidades: `Primary Ability`, `Secondary Ability`, `Hidden Ability`

---

### üìä Pr√≥ximos Passos

‚úÖ **Limpeza de Dados Conclu√≠da** ‚Üí Pr√≥xima etapa: `analise.ipynb`

**1. An√°lise Explorat√≥ria de Dados (EDA)** - Execute `analise.ipynb`

O notebook de an√°lise implementa as seguintes explora√ß√µes:

- **Se√ß√£o 1: Carregamento e Prepara√ß√£o**
  - Carrega `pokemon_dataset_cleaned.csv`
  - Remove valores nulos das colunas principais
  - Exibe distribui√ß√£o dos grupos de Pok√©mon com gr√°ficos de barras

- **Se√ß√£o 2.1: Visualiza√ß√µes e Distribui√ß√µes**
  - Histogramas dos 6 atributos de combate (HP, Attack, Defense, SpAtk, SpDef, Speed)
  - Linhas de m√©dia e mediana em cada gr√°fico
  - Estat√≠sticas descritivas completas
  - Boxplot e Violin Plot de BST por Grupo
  - Matriz de correla√ß√£o (triangular inferior)
  - Top 5 correla√ß√µes mais fortes identificadas
  - Estat√≠sticas descritivas por grupo
  - **Teste ANOVA** para validar diferen√ßas significativas entre grupos (Œ±=0.05)

- **Se√ß√£o 2.2: An√°lise de Tipos**
  - Top 10 tipos prim√°rios mais comuns
  - Propor√ß√£o de Pok√©mon com/sem tipo secund√°rio
  - Gr√°ficos de barras horizontais dos atributos m√©dios por tipo
  - Compara√ß√£o de Attack, Defense, Speed e BST entre tipos

- **Se√ß√£o 2.3: An√°lise de CatchRate**
  - Distribui√ß√£o geral com linha de m√©dia
  - Boxplot de CatchRate por grupo
  - Scatter plot: CatchRate vs BST
  - Correla√ß√µes de CatchRate com todos os atributos
  - Estat√≠sticas de CatchRate por grupo
  - Interpreta√ß√£o: maior CatchRate = mais f√°cil de capturar

- **Se√ß√£o 3: Resumo e Descobertas**
  - Principais insights organizados
  - Correla√ß√µes importantes identificadas
  - Diferen√ßas estat√≠sticas entre grupos confirmadas
  - Recomenda√ß√µes para a modelagem

**2. Modelagem Preditiva** - Execute `modelagem.ipynb`

Ap√≥s a an√°lise explorat√≥ria, o notebook de modelagem implementa:

- **Regress√£o OLS**: Prever CatchRate com diagn√≥stico completo
- **Regress√£o Polinomial**: Capturar rela√ß√µes n√£o-lineares
- **Random Forest Classifier**: Classifica√ß√£o multiclasse de grupos
- **PyCaret AutoML**: Otimiza√ß√£o autom√°tica de modelos

---

### üí° Observa√ß√µes Importantes

- **Dataset pronto para an√°lise**: Todas as colunas est√£o limpas e padronizadas
- **Nulos tratados**: Decis√µes documentadas sobre cada coluna
- **Reprodutibilidade**: Este notebook pode ser executado novamente para regenerar o dataset limpo
- **Fluxo completo**: limpeza.ipynb ‚Üí analise.ipynb ‚Üí modelagem.ipynb

---

**‚ú® Limpeza conclu√≠da com sucesso!** O dataset est√° pronto para an√°lise explorat√≥ria detalhada.