## Loading in Dataest

The below cell will hold the code that contains the data that will be used to train our convolutional neural network on classifying pokemon types based on their image. The /images subdirectory will contain the Image files and the "pokemon.csv" file which is a .csv file containg Pokemon Names and their respective Typings

In [4]:
# Import neccessary packages
import os 
import pandas as pd

In [5]:
# Helper function to get the file path for the images subdirectory
prefix = './data/images/'
def get_image_path(name):
  for ext in ['.png', '.jpg']:
    potential_path = os.path.join(prefix, name + ext)
    if os.path.exists(potential_path):
      return potential_path

In [8]:
# Read in .csv file 
pokemon_df = pd.read_csv('./data/pokemon.csv')

# Append file path to dataframe, map each pokemons name to it's image file
pokemon_df['Image'] = pokemon_df['Name'].apply(get_image_path)

# Narrow down the dataframe to single types, we will deal with dual types later
pokemon_df = pokemon_df[['Name', 'Type1', 'Image']]
pokemon_df.rename(columns={'Type1' : 'Type'}, inplace=True)

In [9]:
display(pokemon_df)

Unnamed: 0,Name,Type,Image
0,bulbasaur,Grass,./data/images/bulbasaur.png
1,ivysaur,Grass,./data/images/ivysaur.png
2,venusaur,Grass,./data/images/venusaur.png
3,charmander,Fire,./data/images/charmander.png
4,charmeleon,Fire,./data/images/charmeleon.png
...,...,...,...
804,stakataka,Rock,./data/images/stakataka.jpg
805,blacephalon,Fire,./data/images/blacephalon.jpg
806,zeraora,Electric,./data/images/zeraora.jpg
807,meltan,Steel,./data/images/meltan.jpg


## Machine Learning Models and One Hot Encoding

Machine Learning models don't deal well with categorical data, which in this case is our 'Type' variable, a qualitative label which is providing information about the pokemon, such as it's strengths and weaknesses. My first thought when learning about this was to create a dictionary mapping integers to each type of pokemon, which is called "Integer Encoding" - but after researching it more I learned it could lead to the model giving preference to certain types, i.e. Imagine the dictionary:

```Python
type_dict = {
  'Fire': 1,
  'Water':2
}
```

This could lead to the model developing a bias towards identifying pokemon's types as fire. So I learned about One Hot Encoding. i.e. Creating bitstrings in order to represent what type the pokemon is (it's primary type for now). We will create a column for each type, and each pokemon will have a '1' in the column corresponding to it's type, and a '0' in every other column. 