# Pokemon Data Analysis
---
Introduction to Data Analysis in python using `pandas`, `numpy`, and `matplotlib`. 

All data drawn from Kaggle's "Complete Pokemon Dataset" for Generations 1 - 7 and a web scraping of Serebii.net for Gen 8.
The sites for the original datasets are listed below: 
- [Generations 1 - 7](https://www.kaggle.com/rounakbanik/pokemon/version/1)
- [Generation 8](https://github.com/yaylinda/serebii-parser)  
  
![Pokeball](https://image.businessinsider.com/5dcee8473afd37158f6c8ab9?width=1100&format=jpeg&auto=webp)

## Installing Dependencies
--- 
First, we need to make sure that we have access to the libraries that we need. There are a couple of ways that we could do this, one of which involves conda, but I like `pip`. `Pip`, or the Python Installer Package, solves this pretty easily in a couple lines. 

In [None]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install matplotlib
!{sys.executable} -m pip install xlrd

Error: Error: Jupyter server crashed. Unable to connect. 
Error code from jupyter: 1

## Importing The Proper Libraries
---
For this analysis, I'm going to be using `pandas` to interact with the data. Below, we will import the proper libraries and use the preferred shorthand notation for them. "`pd`" is common for pandas, while "`np`" and "`plt`" are common substitions for numpy and matplotlib's plotting functionality. `Regex` or "`re`" is also imported for some more complex sorting, but more on that later. 

In [None]:
# import pandas and numpy for data analysis
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import re # TODO: update md documentation

Error: Error: Jupyter server crashed. Unable to connect. 
Error code from jupyter: 1

## Reading Data Into The Notebook
Pandas has a handy set of functions called read_{file extension} that allows you to import your data into a workable format called a dataframe. Both of my files are .CSV, but there are plenty of other supported extensions, including excel and standard .txt files.  

In [3]:
pokemon = pd.read_excel("pokedex.xlsx")
pokemon

Unnamed: 0,Pokedex Number,Name,Type 1,Type 2,HP,Attack,Defense,Special Attack,Special Defense,Speed,...,Against Ground,Against Ice,Against Normal,Against Poison,Against Psychic,Against Rock,Against Steel,Against Water,Steps to Hatch,Experience Growth
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,...,1.0,2.0,1.0,1.0,2,1.0,1.0,0.5,5120.0,1059860.0
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,...,1.0,2.0,1.0,1.0,2,1.0,1.0,0.5,5120.0,1059860.0
2,3,Venusaur,Grass,Poison,80,100,123,122,120,80,...,1.0,2.0,1.0,1.0,2,1.0,1.0,0.5,5120.0,1059860.0
3,4,Charmander,Fire,,39,52,43,60,50,65,...,2.0,0.5,1.0,1.0,1,2.0,0.5,2.0,5120.0,1059860.0
4,5,Charmeleon,Fire,,58,64,58,80,65,80,...,2.0,0.5,1.0,1.0,1,2.0,0.5,2.0,5120.0,1059860.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
916,886,Drakloak,Dragon,Ghost,68,80,50,60,50,102,...,1.0,2.0,0.0,0.5,1,1.0,1.0,0.5,,
917,887,Dragapult,Dragon,Ghost,88,120,75,100,75,142,...,1.0,2.0,0.0,0.5,1,1.0,1.0,0.5,,
918,888,Zacian,Fairy,Steel,92,130,115,80,115,138,...,2.0,0.5,0.5,0.0,0.5,0.5,1.0,1.0,,
919,889,Zamazenta,Fighting,Steel,92,130,115,80,115,138,...,2.0,0.5,0.5,0.0,1,0.5,0.5,1.0,,


In [4]:
pokemon.columns

Index(['Pokedex Number', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense',
       'Special Attack', 'Special Defense', 'Speed', 'Generation',
       'Is Legendary', 'Weight (kg)', 'Against Bug', 'Against Dark',
       'Against Dragon', 'Against Electric', 'Against Fairy',
       'Against Fighting', 'Against Fire', 'Against Flying', 'Against Ghost',
       'Against Grass', 'Against Ground', 'Against Ice', 'Against Normal',
       'Against Poison', 'Against Psychic', 'Against Rock', 'Against Steel',
       'Against Water', 'Steps to Hatch', 'Experience Growth'],
      dtype='object')

In [12]:
# Creates a new df containing only pokemon that have Grass as one of their typings 
grass = pokemon.loc[(pokemon["Type 1"] == "Grass") | (pokemon["Type 2"] == "Grass")]
grass

Unnamed: 0,Pokedex Number,Name,Type 1,Type 2,HP,Attack,Defense,Special Attack,Special Defense,Speed,...,Against Ground,Against Ice,Against Normal,Against Poison,Against Psychic,Against Rock,Against Steel,Against Water,Steps to Hatch,Experience Growth
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,...,1.0,2.0,1.0,1.0,2,1.0,1.0,0.50,5120.0,1059860.0
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,...,1.0,2.0,1.0,1.0,2,1.0,1.0,0.50,5120.0,1059860.0
2,3,Venusaur,Grass,Poison,80,100,123,122,120,80,...,1.0,2.0,1.0,1.0,2,1.0,1.0,0.50,5120.0,1059860.0
49,43,Oddish,Grass,Poison,45,50,55,75,65,30,...,1.0,2.0,1.0,1.0,2,1.0,1.0,0.50,5120.0,1059860.0
50,44,Gloom,Grass,Poison,60,65,70,85,75,40,...,1.0,2.0,1.0,1.0,2,1.0,1.0,0.50,5120.0,1059860.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
859,829,Gossifleur,Grass,,40,40,60,40,60,10,...,0.5,2.0,1.0,2.0,1,1.0,1.0,0.50,,
860,830,Eldegoss,Grass,,60,50,90,80,120,60,...,0.5,2.0,1.0,2.0,1,1.0,1.0,0.50,,
870,840,Applin,Grass,Dragon,40,40,80,40,40,20,...,0.5,4.0,1.0,2.0,1,1.0,1.0,0.25,,
871,841,Flapple,Grass,Dragon,70,110,80,95,60,70,...,0.5,4.0,1.0,2.0,1,1.0,1.0,0.25,,


In [None]:
fire == pokemon.loc[(pokemon["Type 1"] == "Fire") | (pokemon["Type 2"] == "Fire")]
fire

In [13]:
grass.describe

<bound method NDFrame.describe of      Pokedex Number        Name Type 1  Type 2   HP  Attack  Defense  \
0                 1   Bulbasaur  Grass  Poison   45      49       49   
1                 2     Ivysaur  Grass  Poison   60      62       63   
2                 3    Venusaur  Grass  Poison   80     100      123   
49               43      Oddish  Grass  Poison   45      50       55   
50               44       Gloom  Grass  Poison   60      65       70   
..              ...         ...    ...     ...  ...     ...      ...   
859             829  Gossifleur  Grass     NaN   40      40       60   
860             830    Eldegoss  Grass     NaN   60      50       90   
870             840      Applin  Grass  Dragon   40      40       80   
871             841     Flapple  Grass  Dragon   70     110       80   
872             842    Appletun  Grass  Dragon  110      85       80   

     Special Attack  Special Defense  Speed  ...  Against Ground  Against Ice  \
0                65 

## Basic Navigation  
---
However, notice that these datasets are quite large, especially gens 1 - 7, with over 800 rows and 41 columns. As such, not all of it is shown in the output cell. Fortunately, `pandas` has a couple helpful methods that we can use to view the parts we want to see.

First up, `.head()`. By default, this will display the first 5 rows of the dataframe, however, you can change this by adding a numerical argument for the number of rows you wish to view.  

Now, if you're thinking, "hey, that's useful, but is there another optional argument to reverse it?" The answer would technically be no, but, there is another method. This one is aptly named `.tail()`. `.tail()` functions exactly like `.head()` does, execpt it shows the *last* 5 rows in a dataframe. Again though, you can change this with an optional numeric argument.

We can use `.columns` to display the headers of our dataframe

Now, let's use these methods to reorganize our data. Currently, the dataframe headers are run alphabetically. While this might be useful in some cases, it isn't in ours. So, let's move things around to resemble a more traditional pokedex.