<a href="https://colab.research.google.com/github/blackcrowX/Data-Analysis-Projects/blob/main/Python/pokemon8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align="center">Exploratory Data Analysis into Pokemon</h1>

<img src="https://static.wikia.nocookie.net/logo-timeline/images/2/21/Pok%C3%A9mon_%28Print%29.svg/revision/latest?cb=20181024043055"/>

<p align="center"><em>Image taken from: https://logo-timeline.fandom.com/wiki/Pok%C3%A9mon/Other</em></p>


## Table of Contents

*   Introduction
*   Data
*   Setup

  1. [Load Libraries](#1)
  2. [Load Data](#2)

*   Data Cleaning

  3. [Read Data](#3)
  4. [Basic Analysis](#4)
  5. [Data Cleaning](#5)
  6. [Frequency](#6)
  7. [The Strongest and The Weakest](#7)

*   Data Analysis

  5. [Frequency](#6)
  6. [The Strongest and The Weakest](#7)
  7. [The Fastest and The Slowest](#8)
  8. [Summary](#9)

*   Data Visualisation

  8. [Count Plot](#11)
  9. [Pie Plot](#12)
  10. [Box Plot and Violin Plot](#13)
  11. [Swarm Flot](#14)
  12. [Heat Map](#15)

*   Findings

# Introduction 


This data analysis case study will be on a dataset regarding pokemon. It contains data manipulations to try and find answers to questions using visuals of data and statistics. 

Considering how diverse Pokemon are, I was interested in analyzing this datset to learn how the game is balanced and to potentially identify the best Pokemon, if there exists one.



## Data

The Pokemon dataset is a listing of all 898 Pokemon species, 1072 including alternate forms, as of 2021. It contains data about their number, name, first and second type, basic statistics, total statistics,  generation, and legendary status.

Source: <a href="https://data.world/data-society/pokemon-with-stats">data.world</a>

# Setup

## Step 1: Import Libraries

Import and configure libraries required for data analysis.

In [3]:
import numpy as np
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

## Step 2: Import Dataset
Import dataset as variable `df` into Python.

In [10]:
url = 'https://raw.githubusercontent.com/blackcrowX/Data-Analysis-Projects/main/Datasets/pokemon-stats-gen-1-8.csv'
df = pd.read_csv(url)

# Data Cleaning

## Step 3: Review Data

Read the first five rows of the dataframe.

In [11]:
df.head()

Unnamed: 0,number,name,type1,type2,total,hp,attack,defense,sp_attack,sp_defense,speed,generation,legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,3,Gigantamax Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False


Read the last five rows of the dataframe.

In [6]:
df.tail()

Unnamed: 0,number,name,type1,type2,total,hp,attack,defense,sp_attack,sp_defense,speed,generation,legendary
1067,896,Glastrier,Ice,,580,100,145,130,65,110,30,8,True
1068,897,Spectrier,Ghost,,580,100,65,60,145,80,130,8,True
1069,898,Calyrex,Psychic,Grass,500,100,80,80,80,80,80,8,True
1070,898,Ice Rider Calyrex,Psychic,Ice,680,100,165,150,85,130,50,8,True
1071,898,Shadow Rider Calyrex,Psychic,Ghost,680,100,85,80,165,100,150,8,True


From the intial review of the head and tail of the dataframe. The amount of entries match the amount of pokemon species and alternate forms. Furthermore there seem to be missing values (`NaN`) and the entries have a unique name instead of number value. 

## Step 4: Data Info
Check the info of the dataset.

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1072 entries, 0 to 1071
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   number      1072 non-null   int64 
 1   name        1072 non-null   object
 2   type1       1072 non-null   object
 3   type2       574 non-null    object
 4   total       1072 non-null   int64 
 5   hp          1072 non-null   int64 
 6   attack      1072 non-null   int64 
 7   defense     1072 non-null   int64 
 8   sp_attack   1072 non-null   int64 
 9   sp_defense  1072 non-null   int64 
 10  speed       1072 non-null   int64 
 11  generation  1072 non-null   int64 
 12  legendary   1072 non-null   bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 101.7+ KB


The basic insight here is that the dataframe has 13 columns of which nine are of type integer, three are of type object and one is of type boolean. The amount of rows matches the amount of pokemon species including alternate forms. The column types match their respective value.

## Step 5: Missing Values
Check for `Null` or `NaN` values.


In [12]:
df.isnull().sum()

number          0
name            0
type1           0
type2         498
total           0
hp              0
attack          0
defense         0
sp_attack       0
sp_defense      0
speed           0
generation      0
legendary       0
dtype: int64

Since all Pokemon species have a primary type but not necessarily a secondary type, we'll fill in these missing values with a placeholder.

In [17]:
df['type2'].fillna(value='None', inplace=True)

## Step 6: Organize Columns

Create new column (`type`) by combining columns `type1`and `type2`.

In [23]:
df["type"] = df.apply(lambda x: x["type1"] if pd.isnull(x["type2"]) else f'{x["type1"]} _{x["type2"]}', axis=1)

Rename columns `type1` into `primary_type` and `type2` into `secondary_type`.

In [27]:
df.rename(columns = {"type1":"primary_type", "type2":"secondary_type"}, inplace = True)

Check columns in the dataframe.

In [28]:
df.columns

Index(['number', 'name', 'primary_type', 'secondary_type', 'total', 'hp',
       'attack', 'defense', 'sp_attack', 'sp_defense', 'speed', 'generation',
       'legendary', 'type'],
      dtype='object')

## Step 7: Set Index

Adjust dataframe index to `"name"`.

In [29]:
df.set_index("name")

Unnamed: 0_level_0,number,primary_type,secondary_type,total,hp,attack,defense,sp_attack,sp_defense,speed,generation,legendary,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Bulbasaur,1,Grass,Poison,318,45,49,49,65,65,45,1,False,Grass _Poison
Ivysaur,2,Grass,Poison,405,60,62,63,80,80,60,1,False,Grass _Poison
Venusaur,3,Grass,Poison,525,80,82,83,100,100,80,1,False,Grass _Poison
Mega Venusaur,3,Grass,Poison,625,80,100,123,122,120,80,1,False,Grass _Poison
Gigantamax Venusaur,3,Grass,Poison,525,80,82,83,100,100,80,1,False,Grass _Poison
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Glastrier,896,Ice,,580,100,145,130,65,110,30,8,True,Ice _None
Spectrier,897,Ghost,,580,100,65,60,145,80,130,8,True,Ghost _None
Calyrex,898,Psychic,Grass,500,100,80,80,80,80,80,8,True,Psychic _Grass
Ice Rider Calyrex,898,Psychic,Ice,680,100,165,150,85,130,50,8,True,Psychic _Ice
