# 4. Pokedex
**Data extracted from [7,000 Labeled Pokemon](https://www.kaggle.com/datasets/lantian773030/pokemonclassification)**.

The purpose of this project is to train multiple classifier models to recognize the first generacion of Pokemon, like the gadget called Pokedex from the anime/videogame.

Answer this cuestions:
1. What is the problem that you want to resolve? ¿Which questions do you expect to answer with this result?
* I want to recreate the gadget from the videogame Pokemon, using a dataset of the first pokemon generation.
2. What data is necessary and is available for resolving the problem? There is a dataset which you can use to? It is factible to generate and construct the dataset?
* The dataset must be a collection of images of the first pokemon generation. The dataset is from **[this link](https://www.kaggle.com/datasets/lantian773030/pokemonclassification)**. If the ammount of image is not enough
3. What kind of analisis do you try to resolve with this dataset?
* A classification one.
4. What result do you expect? What is the minimum percentage of success to consider an successful model?
* I will consider a successful model a model with success ratio fo %80.

## 0. From image to Image Embeddings
In order to run this Jupyter Notebook, it's necessary to create the embeddings from the dataset. For that reason, follow the next steps before the next sections:
1. Download the dataset from the link above and unzip the data into a folder called `data/labeled_pokemon`.
2. Open the Orange file `dataset_generator.ows` and verify:
    * The 'read images' module loads `data/labeled_pokemon`.
    * The 'Save csv' module saves in `data/dataset.csv`.
3. Run the Orange file.

Once you have the dataset saved, you can use it in this jupyter notebook.

In [7]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## 1. Read Data

In [8]:
df = pd.read_csv('./data/dataset.csv')

## 2. Data Preprocessing

In [9]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
n0,6817.0,0.287901,0.235049,0.0,0.114780,0.229938,0.396214,1.882315e+00
n1,6817.0,0.208318,0.198762,0.0,0.075262,0.152767,0.276005,1.901746e+00
n2,6817.0,0.172583,0.216252,0.0,0.027973,0.098030,0.232473,2.111963e+00
n3,6817.0,0.318159,0.258037,0.0,0.125018,0.253430,0.445456,1.967986e+00
n4,6817.0,0.394518,0.254472,0.0,0.205306,0.349811,0.535109,1.977203e+00
...,...,...,...,...,...,...,...,...
n2046,6817.0,0.401140,0.372707,0.0,0.116903,0.294443,0.583472,2.810535e+00
n2047,6817.0,0.256555,0.278577,0.0,0.048833,0.165491,0.374958,2.252875e+00
size,6817.0,66466.247616,128202.460388,1817.0,15732.000000,31531.000000,78327.000000,4.579587e+06
width,6817.0,469.861963,343.178512,43.0,221.000000,354.000000,667.000000,5.000000e+03


In [10]:
df['category'].unique()

array(['Abra', 'Aerodactyl', 'Alakazam', 'Alolan Sandslash', 'Arbok',
       'Arcanine', 'Articuno', 'Beedrill', 'Bellsprout', 'Blastoise',
       'Bulbasaur', 'Butterfree', 'Caterpie', 'Chansey', 'Charizard',
       'Charmander', 'Charmeleon', 'Clefable', 'Clefairy', 'Cloyster',
       'Cubone', 'Dewgong', 'Diglett', 'Ditto', 'Dodrio', 'Doduo',
       'Dragonair', 'Dragonite', 'Dratini', 'Drowzee', 'Dugtrio', 'Eevee',
       'Ekans', 'Electabuzz', 'Electrode', 'Exeggcute', 'Exeggutor',
       'Farfetchd', 'Fearow', 'Flareon', 'Gastly', 'Gengar', 'Geodude',
       'Gloom', 'Golbat', 'Goldeen', 'Golduck', 'Golem', 'Graveler',
       'Grimer', 'Growlithe', 'Gyarados', 'Haunter', 'Hitmonchan',
       'Hitmonlee', 'Horsea', 'Hypno', 'Ivysaur', 'Jigglypuff', 'Jolteon',
       'Jynx', 'Kabuto', 'Kabutops', 'Kadabra', 'Kakuna', 'Kangaskhan',
       'Kingler', 'Koffing', 'Krabby', 'Lapras', 'Lickitung', 'Machamp',
       'Machoke', 'Machop', 'Magikarp', 'Magmar', 'Magnemite', 'Magneton',
     

In [16]:
X, y = df.drop(['size', 'width', 'height', 'category', 'image', 'image name'], axis=1), df['category']

In [28]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)


In [17]:
seed = 420
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=seed)

## 3. Model Creation

In [31]:
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier()

In [32]:
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier()


In [33]:
from sklearn.neural_network import MLPClassifier
neural_network = MLPClassifier()

In [34]:
from sklearn.svm import SVC
svm = SVC()

## 4. Adjust Model with Historic Data

In [35]:
tree.fit(X_train, y_train)

DecisionTreeClassifier()

In [36]:
random_forest.fit(X_train, y_train)

RandomForestClassifier()

In [37]:
neural_network.fit(X_train, y_train)

MLPClassifier()

In [38]:
svm.fit(X_train, y_train)

SVC()

## 5. Prediction for New Data

In [42]:
tree_pred = tree.predict(X_test)

In [43]:
random_forest_pred = random_forest.predict(X_test)

In [44]:
neural_network_pred = neural_network.predict(X_test)

In [45]:
svm_predict = svm.predict(X_test)

## 6. Visualization of Results

In [46]:
from sklearn.model_selection import cross_val_score

scores = [ 
    cross_val_score(tree, X_test , y_test, cv=3, scoring='accuracy').mean(), 
    cross_val_score(neural_network, X_test, y_test, cv=3, scoring='accuracy').mean(), 
    cross_val_score(svm, X_test, y_test, cv=3, scoring='accuracy').mean(), 
    cross_val_score(random_forest, X_test, y_test, cv=3, scoring='accuracy').mean(), 
]

print(scores)



[0.057179971276887585, 0.3599522357231608, 0.2060124897129302, 0.17888528505268594]


In [None]:
sns.heatmap()