**Table of Contents**
1. [Basic Analysis](#1)
    * [Data Cleaning](#2)
    * [Frequency](#3)
    * [The Strongest and The Weakest](#4)
    * [The Fastest and The Slowest](#5)
    * [Summary](#6)

 <a id = "1"></a><br>
 
## Basic Analysis

In [1]:
#importing all important packages
import numpy as np #linear algebra
import pandas as pd #data processing
import matplotlib.pyplot as plt #data visualisation
import seaborn as sns #data visualisation
%matplotlib inline

In [2]:
#Input Data
data = pd.read_csv("../input/pokemon/Pokemon.csv") #reading csv file and save it into a variable
data.head(10) #show the first 10 rows in data

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


We finally know that our data has 12 columns.
* *Name*       : Nominal data
* *Type 1*     : Nominal data
* *Type 2*     : Nominal data
* *Total *     : Ratio data
* *HP *        : Ratio data
* *Attack*     : Ratio data
* *Defense*    : Ratio data
* *Sp Atk*     : Ratio data
* *Sp Def*     : Ratio data
* *Speed*      : Ratio data
* *Generation* : Ordinal data
* *Legendary*  : Nominal data

<a id = "2"></a>
### Data Cleaning
I found some unneeded text in *Name* column. For example, "CharizardMega Charizard X" should be "Mega Charizard X". So we need to remove all characters before "Mega".

In [3]:
data.Name = data.Name.str.replace(".*(?=Mega)", "")
data.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


In [4]:
data = data.set_index('Name') #change and set the index to the name attribute
data = data.drop(['#'],axis=1) #drop the columns with axis=1; axis=0 is for rows
data.head()

Unnamed: 0_level_0,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
Charmander,Fire,,309,39,52,43,60,50,65,1,False


If we look at row 5, there is a NaN type in the *Type 2* row. We can choose to delete or fill in the data. But in this case if we delete rows that has NaN, then it will mess up our data. Then we'll choose to fill it by copying the data from *Type 1* column.

In [5]:
data['Type 2'].fillna(data['Type 1'], inplace=True)
data.head(10)

Unnamed: 0_level_0,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
Charmander,Fire,Fire,309,39,52,43,60,50,65,1,False
Charmeleon,Fire,Fire,405,58,64,58,80,65,80,1,False
Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
Squirtle,Water,Water,314,44,48,65,50,64,43,1,False


<a id = "3"></a>
### Frequency
Now, let's see all unique types in *Type 1* and *Type 2*.

In [6]:
print("Type 1:",data["Type 1"].unique(), "=", len(data["Type 1"].unique()))
print("Type 2:",data["Type 2"].unique(), "=", len(data["Type 2"].unique()))

Type 1: ['Grass' 'Fire' 'Water' 'Bug' 'Normal' 'Poison' 'Electric' 'Ground'
 'Fairy' 'Fighting' 'Psychic' 'Rock' 'Ghost' 'Ice' 'Dragon' 'Dark' 'Steel'
 'Flying'] = 18
Type 2: ['Poison' 'Fire' 'Flying' 'Dragon' 'Water' 'Bug' 'Normal' 'Electric'
 'Ground' 'Fairy' 'Grass' 'Fighting' 'Psychic' 'Steel' 'Ice' 'Rock' 'Dark'
 'Ghost'] = 18


And we get that there are 18 unique types.
Ok, now we use *value_counts()* to count each unique type in *Type 1 * and * Type 2*

In [7]:
print(data["Type 1"].value_counts())
print(data["Type 2"].value_counts())

Water       112
Normal       98
Grass        70
Bug          69
Psychic      57
Fire         52
Electric     44
Rock         44
Dragon       32
Ghost        32
Ground       32
Dark         31
Poison       28
Fighting     27
Steel        27
Ice          24
Fairy        17
Flying        4
Name: Type 1, dtype: int64
Flying      99
Water       73
Psychic     71
Normal      65
Grass       58
Poison      49
Ground      48
Fighting    46
Fire        40
Fairy       38
Electric    33
Dark        30
Dragon      29
Steel       27
Ice         27
Ghost       24
Rock        23
Bug         20
Name: Type 2, dtype: int64


We can conclude that the highest frequency in *Type 1* is **Water** and in *Type 2* is **Flying**. On the other hand, the lowest frequency in *Type 1* is **Flying** and in *Type 2* is **Bug**

<a id = "4"></a>
### The Strongest and The Weakest
**Who is the strongest and the weakest Pokemons by types?** We will find out.

In [8]:
strongest = data.sort_values(by='Total', ascending=False) #sorting the rows in descending order
strongest.drop_duplicates(subset=['Type 1'],keep='first')
#since the rows are now sorted in descending order
#thus we take the first row for every new type of pokemon i.e the table will check Type 1 of every pokemon
#The first pokemon of that type is the strongest for that type
#so we just keep the first row

Unnamed: 0_level_0,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Mega Rayquaza,Dragon,Flying,780,105,180,100,180,100,115,3,True
Mega Mewtwo Y,Psychic,Psychic,780,106,150,70,194,120,140,1,True
KyogrePrimal Kyogre,Water,Water,770,100,150,90,180,160,90,3,True
GroudonPrimal Groudon,Ground,Fire,770,100,180,160,150,90,90,3,True
Arceus,Normal,Normal,720,120,120,120,120,120,120,4,True
Mega Metagross,Steel,Psychic,700,80,145,150,105,110,110,3,False
Mega Tyranitar,Rock,Dark,700,100,164,150,95,120,71,2,False
GiratinaOrigin Forme,Ghost,Dragon,680,150,120,100,120,100,90,4,True
Ho-oh,Fire,Flying,680,106,130,90,110,154,90,2,True
Xerneas,Fairy,Fairy,680,126,131,95,131,98,99,6,True


So, we finally know who is the strongest pokemons by types. And also the strongest of the strongest pokemon is **Mega Rayquaza**, the Dragon type. And also we know that 10/18 Strongest Pokemons by types are Legendary. Let's check who is the weakest by types.

In [9]:
weakest = data.sort_values(by='Total') #sorting the rows in ascending order
weakest.drop_duplicates(subset=['Type 1'],keep='first')

Unnamed: 0_level_0,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Sunkern,Grass,Grass,180,30,30,30,30,30,30,2,False
Azurill,Normal,Fairy,190,50,20,40,20,40,20,3,False
Kricketot,Bug,Bug,194,37,25,41,25,41,25,4,False
Ralts,Psychic,Fairy,198,28,25,25,45,35,40,3,False
Magikarp,Water,Water,200,20,10,55,15,20,80,1,False
Pichu,Electric,Electric,205,20,40,15,35,35,60,2,False
Tyrogue,Fighting,Fighting,210,35,35,35,35,35,35,2,False
Cleffa,Fairy,Fairy,218,50,25,28,45,55,15,2,False
Poochyena,Dark,Dark,220,35,55,35,30,30,35,3,False
Zubat,Poison,Flying,245,40,45,35,30,40,55,1,False


We finally know who is the weakest pokemons by types. The weakest of the weakest pokemon is **Sunkern**, the Grass type. We can't find the Legendary category here.

<a id = "5"></a>
### The Fastest and The Slowest
**Now, who is the fastest and the slowest Pokemons by types?**

In [10]:
fastest = data.sort_values(by='Speed', ascending=False) #sorting the rows in descending order
fastest.drop_duplicates(subset=['Type 1'],keep='first')

Unnamed: 0_level_0,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
DeoxysSpeed Forme,Psychic,Psychic,600,50,95,90,95,90,180,3,True
Ninjask,Bug,Flying,456,61,90,45,50,50,160,3,False
Mega Aerodactyl,Rock,Flying,615,80,135,85,70,95,150,1,False
Mega Sceptile,Grass,Dragon,630,70,110,75,145,85,145,3,False
Electrode,Electric,Electric,480,60,50,70,80,80,140,1,False
Mega Lopunny,Normal,Fighting,580,65,136,94,54,96,135,4,False
Crobat,Poison,Flying,535,85,90,80,70,80,130,2,False
Mega Gengar,Ghost,Poison,600,60,65,80,170,95,130,1,False
Talonflame,Fire,Flying,499,78,81,71,74,69,126,6,False
Darkrai,Dark,Dark,600,70,90,90,135,90,125,4,True


The Fastest pokemon is **DeoxysSpeed Forme** which is a Legendary Psychic pokemon.

In [11]:
slowest = data.sort_values(by='Speed') #sorting the rows in ascending order
slowest.drop_duplicates(subset=['Type 1'],keep='first')

Unnamed: 0_level_0,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Munchlax,Normal,Normal,390,135,85,40,40,85,5,4,False
Shuckle,Bug,Rock,505,20,10,230,10,230,5,2,False
Ferroseed,Grass,Steel,305,44,50,91,24,86,10,5,False
Bonsly,Rock,Rock,290,50,80,95,10,45,10,4,False
Trapinch,Ground,Ground,290,45,100,45,45,45,10,3,False
Wooper,Water,Ground,210,55,45,45,25,25,15,2,False
Cleffa,Fairy,Fairy,218,50,25,28,45,55,15,2,False
Litwick,Ghost,Fire,275,50,30,55,65,55,20,5,False
Torkoal,Fire,Fire,470,70,85,140,85,70,20,3,False
Mega Sableye,Dark,Ghost,480,50,85,125,85,115,20,3,False


This data shows that Bug type and Normal type have slowest pokemon compared other types.

<a id = "6"></a>
### Summary

In [12]:
#now, let's summary the data
data.describe()

Unnamed: 0,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,435.1025,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,119.96304,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


_________________________________________________________

<a id = "7"></a>
## Data Visualisation
And now we move to the important part where we will get informations from visualizing our data. First, we make count plots to see value counts for each type