Today, I'll be cleaning data I found from the Pokemon game series to be used for visualizations inside Tableau. The visualizations will be displaying data that nobody asked for. But alas, I need Tableau practice and this sounds fun.

The "Pokedex" data spans the earliest Pokemon games of Red/Blue all the way to the Pokemon X/Y games.

We have four datasets at our disposal:
1. Pokemon - The Pokedex containing the list of pokemon, their types, and their stats.
2. Moves - The list of Pokemon moves they use in battle
3. Evolution - The evolution information for each Pokemon
4. TypeChart - The chart displaying what Types are effective against other types

We are going to import modules and the data, and then take a first look to see if there's any initial cleaning we'd like to perform.

In [28]:
#Importing modules, suppressing overwrite/chained assignment warnings

import pandas as pd
import numpy as np
import matplotlib.pyplot as plot
pd.options.mode.chained_assignment = None

In [29]:
#Importing the data from each sheet of the excel file

spreadsheet = pd.ExcelFile("pokemon.xlsx")
pokedex = pd.read_excel(spreadsheet,"Pokemon")
moves = pd.read_excel(spreadsheet,"Moves")
evolution = pd.read_excel(spreadsheet,"Evolution")
typechart = pd.read_excel(spreadsheet,"TypeChart")
print(pokedex.head(20))

         #              Name    Type  Total  HP  Attack  Defense  \
0      001         Bulbasaur   GRASS    318  45      49       49   
1      001         Bulbasaur  POISON    318  45      49       49   
2      002           Ivysaur   GRASS    405  60      62       63   
3      002           Ivysaur  POISON    405  60      62       63   
4      003          Venusaur   GRASS    525  80      82       83   
5      003          Venusaur  POISON    525  80      82       83   
6    003.1     Mega Venusaur   GRASS    625  80     100      123   
7    003.1     Mega Venusaur  POISON    625  80     100      123   
8      004        Charmander    FIRE    309  39      52       43   
9      005        Charmeleon    FIRE    405  58      64       58   
10     006         Charizard    FIRE    534  78      84       78   
11     006         Charizard  FLYING    534  78      84       78   
12   006.1  Mega Charizard X    FIRE    634  78     130      111   
13   006.1  Mega Charizard X  DRAGON    634  78 

At quick glance, we can see duplicate entries for every Pokemon in the dataset. If we look closer, these duplicate entries have all the exact same information except for that the "Type" column has a different value.

Drawing upon my knowledge of the games, all Pokemon have at least one type, and a portion of them have two types. We can represent this more cleanly by transforming our "Type" column into a "Type1" and "Type2" column. Having duplicate rows would make any analysis or visualization of this data tedious, so we will perform this clean below.

In [22]:
#Reassigning Type to Type1 and Type2 columns
counter = 0
pokedex["Type1"] = 0
pokedex["Type2"] = None

while counter < (len(pokedex)-1):
    if pokedex["Name"][counter] == pokedex["Name"][counter+1]:
        pokedex["Type1"][counter] = pokedex["Type"][counter]
        pokedex["Type2"][counter] = pokedex["Type"][counter+1]
    else:
        pokedex["Type1"][counter] = pokedex["Type"][counter]
        pokedex["Type2"][counter] = None
    counter += 1

#Finding our duplicate entries so we can drop them
counter = 0
indices = []
while counter < (len(pokedex)-1):
    if counter != 0:
        if (pokedex["Name"][counter] == pokedex["Name"][counter-1]):
            indices.append(counter)
    counter += 1
    
#Adding the last index manually for ease of coding
indices.append(1167)

#Dropping the duplicate entries
pokedex.drop(indices, inplace = True)

#Deleting the Type column and rearranging the column order so Type1 and Type2 take its place
pokedex.drop(["Type"], axis = 1, inplace = True)
Type1 = pokedex.pop("Type1")
Type2 = pokedex.pop("Type2")
pokedex.insert(2, "Type1", Type1)
pokedex.insert(3, "Type2", Type2)

#Reindexing our rows to reflect the removal of the duplicate entries
newindices = pd.Series(range(0,len(pokedex)))
pokedex = pokedex.set_index(newindices)

#Checking the result
print(pokedex.head(10))
print(pokedex.tail(10))

        #              Name  Type1   Type2  Total  HP  Attack  Defense  \
0     001         Bulbasaur  GRASS  POISON    318  45      49       49   
1     002           Ivysaur  GRASS  POISON    405  60      62       63   
2     003          Venusaur  GRASS  POISON    525  80      82       83   
3   003.1     Mega Venusaur  GRASS  POISON    625  80     100      123   
4     004        Charmander   FIRE    None    309  39      52       43   
5     005        Charmeleon   FIRE    None    405  58      64       58   
6     006         Charizard   FIRE  FLYING    534  78      84       78   
7   006.1  Mega Charizard X   FIRE  DRAGON    634  78     130      111   
8   006.2  Mega Charizard Y   FIRE  FLYING    634  78     104       78   
9     007          Squirtle  WATER    None    314  44      48       65   

   Special Attack  Special Defense  Speed  
0              65               65     45  
1              80               80     60  
2             100              100     80  
3        

Now, we can re-export this data back to a new excel with our changes, as well as any other cleans we might want to do, which would be inserted before the following block.

In [23]:
# Initiating our workbook xlsx file

write = pd.ExcelWriter('Pokemon_Clean.xlsx', engine='xlsxwriter')

#Writing each sheet page of the workbook

pokedex.to_excel(write, sheet_name='Pokemon') 
moves.to_excel(write, sheet_name='Moves') 
evolution.to_excel(write, sheet_name='Evolution') 
typechart.to_excel(write, sheet_name='TypeChart')

# Closes writer and outputs clean dataset - commented out to avoid repeatedly outputting files.

# write.save()


Besides the cleaning of duplicate entries because of types, there really isn't anything else to do with this data to achieve my first goals with it. Perhaps more to come!