<a href="https://colab.research.google.com/github/brenoslivio/SCC0652_Computational_Visualization/blob/master/notebooks/Projeto_1_Processamento_de_dados.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Projeto 1 - Processamento de dados

Alunos:

Afonso Henrique Piacentini Garcia, Nº USP: 9795272

Breno Lívio Silva de Almeida, Nº USP: 10276675

Vitor Henrique Gratiere Torres, Nº USP: 10284952

---

## Descrição

Essa parte do projeto da disciplina consiste no processamento de um conjunto de dados, um dataset. O conjunto de dados escolhido foi o de [Pokémons](https://www.pokemon.com/br/), do vídeo-game homônimo. As criaturas, chamadas de Pokémons, são representadas no jogo em questão por um item chamado [Pokédex](https://www.pokemon.com/br/pokedex/), que contém o nome e outras estatísticas relevantes aos jogos, como pontos de vida, ataque e etc. Os dados referentes à Pokédex foram retirados do site [Pokémon Database](https://pokemondb.net/pokedex/all). Também retiramos um [dataset do site Kaggle](https://www.kaggle.com/mariotormo/complete-pokemon-dataset-updated-090420). Ele será usado para utilizar informações relevantes como as gerações, peso e altura dos Pokémons.

### [pokemon.csv](https://github.com/brenoslivio/SCC0652_Computational_Visualization/blob/master/notebooks/crawler/pokemon.csv)

*   attack: stat que indica o poder de ataque fisico base do pokémon
*   cod: O número do pokémon na pokédex
*   defense: stat que indica a defesa física base do pokémon
*   hp: stat que indica a vida base do pokémon
*   img: link para a imagem do pokémon 
*   name: O nome do pokémon
*   spatk: stat que indica o poder de ataque especial do pokémon
*   spdef: stat que indica o poder de defesa especial do pokémon
*   speed: stat que indica a velocidade base do pokémon
*   form: discriminante de mega evoluções e outras formas
*   total: soma total dos stats do pokémon
*   type1: tipo do pokémon
*   type2: segundo tipo do pokémon, caso possua
*   generation: a qual geração o pokémon em questão pertence
*   status: refere-se a raridade do pokémon, variando de normal até mítico
*   species: apelido dado para cada pokémon baseado em algum de seus atributos
*   height_m: altura do pokémon em metros
*   weight_kg: peso do pokémon em kilogramas

### Gerações dos Pokémons

A série de jogos Pokemon se iniciou em 1996, com os jogos *Pokemon Red* e *Pokemon Green*. Com o passar do tempo e lançamento de novos jogos, novas criaturas e mecânicas foram sendo integradas ao jogo, e com isso criou-se o sistema de gerações para facilitar a organização dos pokemons.

1ª geração: Bulbasaur 001 - Mew 151

2ª geração: Chikorita 152 - Celebi 251

3ª geração: Treecko 252 - Deoxys 386

4ª geração: Turtwig 387 - Arceus 493

5ª geração: Victini 494 - Genesect 649

6ª geração: Chespin 650 - Volcanion 721

7ª geração: Rowlet 722 - Melmetal 809

8ª geração: Grookey 810 - Eternamax 890

### [pokemonKaggle.csv](https://github.com/brenoslivio/SCC0652_Computational_Visualization/blob/master/notebooks/kaggle/pokedexKaggle.csv)

## Desenvolvimento


Leitura do CSV gerado do Pokémon Database como data frame do Pandas

In [1]:
import numpy as np
import pandas as pd

from IPython.display import Image
from IPython.core.display import HTML

dfPokemonDB = pd.read_csv(
    "crawler/pokemon.csv",
    dtype={'attack': np.int32,
           'cod': str,
           'defense': np.int32,
           'hp': np.int32,
           'spatk': np.int32,
           'spdef': np.int32,
           'speed': np.int32,
           'total': np.int32},
    keep_default_na=False)

dfPokemonDB['img'].replace('', np.nan, inplace=True)

dfPokemonDB.dropna(subset=['img'], inplace=True)

dfPokemonDB.sample(5)

Unnamed: 0,attack,cod,defense,form,hp,img,name,spatk,spdef,speed,total,type1,type2
471,104,392,71,,76,https://img.pokemondb.net/sprites/home/normal/...,Infernape,104,71,108,534,Fire,Fighting
1002,60,869,75,,65,https://img.pokemondb.net/sprites/home/normal/...,Alcremie,110,121,64,495,Fairy,
421,165,354,75,Mega Banette,64,https://img.pokemondb.net/sprites/home/normal/...,Banette,93,83,75,555,Ghost,
924,137,795,37,,71,https://img.pokemondb.net/sprites/home/normal/...,Pheromosa,137,37,151,570,Bug,Fighting
636,103,542,80,,75,https://img.pokemondb.net/sprites/home/normal/...,Leavanny,70,80,92,500,Bug,Grass


Leitura do CSV retirado do Kaggle como data frame do Pandas

In [2]:
dfPokemonKaggle = pd.read_csv(
    "https://raw.githubusercontent.com/brenoslivio/SCC0652_Computatio" +
    "nal_Visualization/master/notebooks/kaggle/pokedexKaggle.csv",
    usecols=['name', 'generation', 'status',
             'species', 'height_m', 'weight_kg'],
    dtype={'name': str,
           'generation': np.int32,
           'status': str,
           'species': str,
           'height_m': np.float64,
           'weight_kg': np.float64})

dfPokemonKaggle.sample(5)

Unnamed: 0,name,generation,status,species,height_m,weight_kg
205,Ledyba,2,Normal,Five Star Pokémon,1.0,10.8
962,Dubwool,8,Normal,Sheep Pokémon,1.3,43.0
37,Alolan Sandshrew,1,Normal,Mouse Pokémon,0.7,40.0
229,Skiploom,2,Normal,Cottonweed Pokémon,0.6,1.0
815,Goodra,6,Normal,Dragon Pokémon,2.0,150.5


Vamos juntar os dois datasets, combinando as variáveis relevantes para análise.

Para unificarmos duas tabelas precisamos guiar a junção com chaves primárias, no dataframe ```dfPokemonDB``` a chave primária é a junção dos atributos ```name``` e ```form```, enquanto no dataframe ```dfPokemonKaggle``` é o atributo ```name```.

In [3]:
# No nosso dataframe criamos uma nova coluna 'tmp' que recebe uma copia
# da coluna 'form'
dfPokemonDB['tmp'] = dfPokemonDB.loc[:, 'form']

# Nessa nova coluna todas as linhas que estiverem vazias receberao os
# valores da coluna 'name'
dfPokemonDB.loc[dfPokemonDB.loc[:, 'tmp'] == '',
                'tmp'] = dfPokemonDB.loc[:, 'name']

# Tratando casos de borda:
# Pokemons com 'Form', 'Cloak', 'Mode', 'Male', 'Female', 'Size' e
# 'Style' na sua forma devem seguir a definicao 'tmp' = 'name' + 'form'
for exception in ['Form', 'Cloak', 'Mode', 'Male',
                  'Female', 'Size', 'Style']:
    dfPokemonDB.loc[dfPokemonDB.loc[:, 'form'].str.contains(exception),
                    'tmp'] = dfPokemonDB.loc[:, 'name'] + " " + dfPokemonDB.loc[:, 'form']

# Tratando casos de borda:
# Pokemons com os nomes 'Hoopa', 'Eiscue', 'Zacian' e 'Zamazenta'
# tambem devem seguir a definicao 'tmp' = 'name' + 'form'
for exception in ['Hoopa', 'Eiscue', 'Zacian', 'Zamazenta']:
    dfPokemonDB.loc[dfPokemonDB.loc[:, 'name'].str.contains(exception),
                    'tmp'] = dfPokemonDB.loc[:, 'name'] + " " + dfPokemonDB.loc[:, 'form']

# Duas excecoes precisam ser tratadas manualmente
dfPokemonDB.loc[dfPokemonDB.loc[:, 'name'] == 'Keldeo',
                'tmp'] = dfPokemonDB.loc[:, 'tmp'] + "e"
dfPokemonDB.loc[dfPokemonDB.loc[:, 'form'] == 'Eternamax',
                'tmp'] = dfPokemonDB.loc[:, 'name'] + " " + dfPokemonDB.loc[:, 'form']

In [4]:
# Mudaremos tambem o nome da coluna 'name' no dataframe do Kaggle para
# 'tmp'
dfPokemonKaggle.rename(columns={'name': 'tmp'}, inplace=True)

dfPokemon = pd.merge(dfPokemonDB, dfPokemonKaggle,
                     on='tmp', how='outer')

dfPokemon = dfPokemon.drop('tmp', axis=1)

dfPokemon = dfPokemon[['cod',
                       'name',
                       'generation',
                       'status',
                       'species',
                       'form',
                       'type1',
                       'type2',
                       'height_m',
                       'weight_kg',
                       'total',
                       'hp',
                       'attack',
                       'defense',
                       'spatk',
                       'spdef',
                       'speed',
                       'img']]

dfPokemon.sample(5)

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
433,366,Clamperl,3,Normal,Bivalve Pokémon,,Water,,0.4,52.5,345,35,64,85,74,55,32,https://img.pokemondb.net/sprites/home/normal/...
493,414,Mothim,4,Normal,Moth Pokémon,,Bug,Flying,0.9,23.3,424,70,94,50,94,50,66,https://img.pokemondb.net/sprites/home/normal/...
266,222,Corsola,2,Normal,Coral Pokémon,,Water,Rock,0.6,5.0,410,65,55,95,65,95,35,https://img.pokemondb.net/sprites/home/normal/...
100,77,Ponyta,1,Normal,Unique Horn Pokémon,Galarian Ponyta,Psychic,,0.8,24.0,410,50,85,55,65,65,90,https://img.pokemondb.net/sprites/home/normal/...
89,70,Weepinbell,1,Normal,Flycatcher Pokémon,,Grass,Poison,1.0,6.4,390,65,90,50,85,45,55,https://img.pokemondb.net/sprites/home/normal/...


In [5]:
# Exportando o dataframe para a proxima parte do projeto
dfPokemon.to_csv('dfPokemon.csv', index=False)

Achando outliers

In [6]:
dfPokemon.drop((dfPokemon.loc[:, 'form'] == 'Eternamax').index)

Q1, Q3 = dfPokemon.loc[:, 'total'].quantile([.25, .75])
IQR = Q3 - Q1

outliers = dfPokemon.loc[
    (dfPokemon.loc[:, 'total'] < (Q1 - 1.5 * IQR)) | 
    (dfPokemon.loc[:, 'total'] > (Q3 + 1.5 * IQR)), :]
outliers

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
1027,890,Eternatus,8,Legendary,Gigantic Pokémon,Eternamax,Poison,Dragon,100.0,,1125,255,115,250,125,250,130,https://img.pokemondb.net/sprites/home/normal/...


In [7]:
Q1 = dfPokemon.loc[:, 'total'].quantile(.25)  # primeiro quartil
Q3 = dfPokemon.loc[:, 'total'].quantile(.75)  # segundo quartil
IQR = Q3 - Q1  # distância interquartil

# vetor contendo os valores booleanos para cada uma das observações
# indicando se ele é um outlier (True) ou não (False)
out = ((dfPokemon.loc[:, 'total'] < (Q1 - 1.5 * IQR)) |
       (dfPokemon.loc[:, 'total'] > (Q3 + 1.5 * IQR)))

# vetor contendo os outliers para o conjunto de dados
pokemonOut = dfPokemon.loc[out, :]

pokemonOut

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
1027,890,Eternatus,8,Legendary,Gigantic Pokémon,Eternamax,Poison,Dragon,100.0,,1125,255,115,250,125,250,130,https://img.pokemondb.net/sprites/home/normal/...


### Comparações entre os extremos dos stats

Comparações entre os extremos de todos os stats presentes nos pokémons, com imagens dos pokémons que possuem tais extremos


**Comparação hp**

In [8]:
hpPokemonMax = dfPokemon.loc[[dfPokemon['hp'].idxmax()]]
hpPokemonMax

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
288,242,Blissey,2,Normal,Happiness Pokémon,,Normal,,1.5,46.8,540,255,10,10,75,135,55,https://img.pokemondb.net/sprites/home/normal/...


In [9]:
Image(url=hpPokemonMax['img'].item(), width=150, height=150)

In [10]:
hpPokemonMin = dfPokemon.loc[[dfPokemon['hp'].idxmin()]]
hpPokemonMin

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
345,292,Shedinja,3,Normal,Shed Pokémon,,Bug,Ghost,0.8,1.2,236,1,90,45,30,30,40,https://img.pokemondb.net/sprites/home/normal/...


In [11]:
Image(url=hpPokemonMin['img'].item(), width=150, height=150)

**Comparação attack**

In [12]:
attackPokemonMax = dfPokemon.loc[[dfPokemon['attack'].idxmax()]]
attackPokemonMax

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
189,150,Mewtwo,1,Legendary,Genetic Pokémon,Mega Mewtwo X,Psychic,Fighting,2.3,127.0,780,106,190,100,154,100,130,https://img.pokemondb.net/sprites/home/normal/...


In [13]:
Image(url=attackPokemonMax['img'].item(), width=150, height=150)

In [14]:
attackPokemonMin = dfPokemon.loc[[dfPokemon['attack'].idxmin()]]
attackPokemonMin

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
145,113,Chansey,1,Normal,Egg Pokémon,,Normal,,1.1,34.6,450,250,5,5,35,105,50,https://img.pokemondb.net/sprites/home/normal/...


In [15]:
Image(url=attackPokemonMin['img'].item(), width=150, height=150)

**Comparação defense**

In [16]:
defensePokemonMax = dfPokemon.loc[[dfPokemon['defense'].idxmax()]]
defensePokemonMax

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
1027,890,Eternatus,8,Legendary,Gigantic Pokémon,Eternamax,Poison,Dragon,100.0,,1125,255,115,250,125,250,130,https://img.pokemondb.net/sprites/home/normal/...


In [17]:
Image(url=defensePokemonMax['img'].item(), width=150, height=150)

In [18]:
defensePokemonMin = dfPokemon.loc[[dfPokemon['defense'].idxmin()]]
defensePokemonMin

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
145,113,Chansey,1,Normal,Egg Pokémon,,Normal,,1.1,34.6,450,250,5,5,35,105,50,https://img.pokemondb.net/sprites/home/normal/...


In [19]:
Image(url=defensePokemonMin['img'].item(), width=150, height=150)

**Comparação spatk**

In [20]:
spatkPokemonMax = dfPokemon.loc[[dfPokemon['spatk'].idxmax()]]
spatkPokemonMax

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
190,150,Mewtwo,1,Legendary,Genetic Pokémon,Mega Mewtwo Y,Psychic,,1.5,33.0,780,106,150,70,194,120,140,https://img.pokemondb.net/sprites/home/normal/...


In [21]:
Image(url=spatkPokemonMax['img'].item(), width=150, height=150)

In [22]:
spatkPokemonMin = dfPokemon.loc[[dfPokemon['hp'].idxmin()]]
spatkPokemonMin

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
345,292,Shedinja,3,Normal,Shed Pokémon,,Bug,Ghost,0.8,1.2,236,1,90,45,30,30,40,https://img.pokemondb.net/sprites/home/normal/...


In [23]:
Image(url=spatkPokemonMin['img'].item(), width=150, height=150)

**Comparação spdef**

In [24]:
spdefPokemonMax = dfPokemon.loc[[dfPokemon['spdef'].idxmax()]]
spdefPokemonMax

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
1027,890,Eternatus,8,Legendary,Gigantic Pokémon,Eternamax,Poison,Dragon,100.0,,1125,255,115,250,125,250,130,https://img.pokemondb.net/sprites/home/normal/...


In [25]:
Image(url=spdefPokemonMax['img'].item(), width=150, height=150)

In [26]:
spdefPokemonMin = dfPokemon.loc[[dfPokemon['spdef'].idxmin()]]
spdefPokemonMin

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
13,10,Caterpie,1,Normal,Worm Pokémon,,Bug,,0.3,2.9,195,45,30,35,20,20,45,https://img.pokemondb.net/sprites/home/normal/...


In [27]:
Image(url=spdefPokemonMin['img'].item(), width=150, height=150)

**Comparação speed**

In [28]:
speedPokemonMax = dfPokemon.loc[[dfPokemon['speed'].idxmax()]]
speedPokemonMax

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
463,386,Deoxys,3,Mythical,DNA Pokémon,Speed Forme,Psychic,,1.7,60.8,600,50,95,90,95,90,180,https://img.pokemondb.net/sprites/home/normal/...


In [29]:
Image(url=speedPokemonMax['img'].item(), width=150, height=150)

In [30]:
speedPokemonMin = dfPokemon.loc[[dfPokemon['speed'].idxmin()]]
speedPokemonMin

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
256,213,Shuckle,2,Normal,Mold Pokémon,,Bug,Rock,0.6,20.5,505,20,10,230,10,230,5,https://img.pokemondb.net/sprites/home/normal/...


In [31]:
Image(url=speedPokemonMin['img'].item(), width=150, height=150)

**Comparação total**

In [32]:
totalPokemonMax = dfPokemon.loc[[dfPokemon['total'].idxmax()]]
totalPokemonMax

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
1027,890,Eternatus,8,Legendary,Gigantic Pokémon,Eternamax,Poison,Dragon,100.0,,1125,255,115,250,125,250,130,https://img.pokemondb.net/sprites/home/normal/...


In [33]:
Image(url=totalPokemonMax['img'].item(), width=150, height=150)

In [34]:
totalPokemonMin = dfPokemon.loc[[dfPokemon['total'].idxmin()]]
totalPokemonMin

Unnamed: 0,cod,name,generation,status,species,form,type1,type2,height_m,weight_kg,total,hp,attack,defense,spatk,spdef,speed,img
871,746,Wishiwashi,7,Normal,Small Fry Pokémon,Solo Form,Water,,0.2,0.3,175,45,20,20,25,25,40,https://img.pokemondb.net/sprites/home/normal/...


In [35]:
Image(url=totalPokemonMin['img'].item(), width=150, height=150)