# Project 7 Report

## Team members: 
Supratik Chanda and Chris Higgs
## Introduction
This project is primarily intended for enjoyment and has little use outside of players of the Pokémon™ video games. Perhaps the only exception being individuals seeking to create similar video games or fantasy universes generally.

Pokémon™ is a fantasy universe originally created by Satoshi Tajiri (CEO of Game Freak™) in 1995 and is currently owned by Nintendo®, Game Freak™, and Creatures™. The universe consists of but is not limited to animated films and TV series, playing cards, and video games. In the Pokémon™ universe, humans live side-by-side with creatures called "Pokémon" (being both singular and plural) even though the primary concept in the universe is to collect Pokémon, train them, and force them to fight other Pokémon (not unlike slaves and dog fighting). These Pokémon are classified in a similar manner as our animal/plant/&c kingdoms are. At the very top of the topology, Pokémon are divided into "types," which originally were aimed toward the natural elemental nature of the specific Pokémon (Earth, Water, Air, &c). 

Within the Pokémon™ universe, the known types have expanded with continued scientific discovery, and many long-term fans have voiced concern about the seemingly haphazard selection of types, two examples to follow. The Metal type: while the existance of metal-like qualities within a selection of Pokémon could be scientifically meaningful, it does not seem to fit in with the other (more elemental) types, such as Fire and Water. Furthermore, many Pokémon are classified as having a distinct secondary type, which is absurd from a classification standpoint akin to saying an organism belongs to both the Plant and Animal Kingdoms.

In this project, we attempt to predict a Pokémon species' primary type from "scientific" measurements and observations from within the video game version of the Pokémon™ universe. We seek to confirm or refute previous predictiveness claims (and findings) from other members of the fanbase that Pokémon type is nearly impossible to predict accurately from the traditional fighting stats, to improve the accuracy of predictions, and ultimately to make claim for or against the soundness of the current classification topology.

## Dataset
The data comes from two Pokémon™ datasets available on Kaggle: <br>
"The Complete Pokemon Dataset" (https://www.kaggle.com/rounakbanik/pokemon)<br>
"Pokemon for Data Mining and Machine Learning" (https://www.kaggle.com/alopez247/pokemon)<br>
The first of which make a decent attempt at completeness.

These datasets contain baseline measurements and observations about 700+ different Pokémon species; including health, defense, strength, and agility ratings, primary body color, average height and weight, resistance to the effects of other elements, and of course the primary and secondary type classification. 

In munging the data, we found that the two datasets were taken from different points in time; one containing data on a set of species that were undiscovered during the time frame of the other dataset as well as having updated information on the known species. For simplicities sake, we truncated the datasets to the least common denominator and used the maximum value for differing measurements. We had significant attribute manipulation for readability and data integrity. We also ended up creating 28 new attributes from categorical data for simplicity of analysis.

## Analysis technique
First, we generated multiple custom matrices representing topology breakdowns across various attributes. For example and as reported in the results section, we collected the number of species known within each of the primary types and exhibiting each of the 10 colors as a primary color to help identify correlations, as such generating a matrix of integers. This approach is appropriate (albeit not the most intuitive) as it directly indicates ratios between the color-type subsets to each other and the calculateable totals. For example, as shown in our results section, if a species exhibits primarily a green appearance, it is approximately six times more likely to be a Grass type than the next most plausible, Bug type. 

Next, we constructed several Logistic Regression models using various subsets of the available attributes in a OVR-like approach for each type and compared precision, recall, and f1-scores. Logistic Regression is appropriate as the majority of our measurements are quantitative values and by using an OVR-like approach, we are able to address each type class.

Finally, we constructed several SVM models using various subsets of the available attributes. We repeated this for linear kerneling, polynomial kerneling with various degrees, and rbf kerneling with various gamma values. SVN is particularly appropriate because we are attempting to predict an attribute that classifies a record into one of many categories, furthermore as with the applicability of Logistic Regression, most of the available measurements are quantitative values.

## Results
For the sake of brevity we only report what we feel to be our most meaningful results.

Collecting the topology breakdown by species primary type and primary color, we obtain the following matrix of species counts:
<img src="pokemon_graphics/matrix.png">

From these values, we observe that certain primary colors are highly depictive of primary type, such as 41 primarily green Grass type species which is nearly six times greater than the 7 green Bug type species--second greatest type for a green species and the 6 brown Grass type species--second greatest color for a Grass type; while other primary colors and primary types have little to no correlation, 9 black Ghost type species as opposed to the second highest 8 black Dark type species and the second highest 8 purple Ghost type species. We find it specifically telling that species exhibiting black coloring is slightly more likely to be a Ghost type instead of a Dark type, which begs the question of what exactly it means to be a Dark type species...

By training several various logistic regression models, our results support other fanbase members' results, specifically that type prediction is poor when solely based on fighting statistics:
<img src="pokemon_graphics/LogReg_FightingStats.png">

Notice that between the Typical and Atypical statistics, the best f1 score we were able to obtain was less than 25%. Further notice that the vast majority of f1 scores are 0 (primarily due to no positive predictions); this indicates that there may not be enough data points to predict a type using only these attributes, also that these attributes are not particularly telling of the classified type. We believe the latter because, as a game, each category should be somewhat equivalent in power (much like a game of rock-paper-scissors) at least at the base level. This, however, goes to show that the type system may be a poor classification topology when it comes to these fighting creatures.

By training several more logistic regression models, we find that primary color alone fails to predict any species' type:
<img src="pokemon_graphics/LogReg_Colors.png">

We note that this is likely due to using only the ten boolean color values, where for any given species only one can be true, so there are in reality only ten points in 10-D space. Further, we would be interested to see if this result would differ if RGB or similar color codes were used instead of a strict ten-class field. We leave this for a future study as it would require additional data collection effort.

By training several more logistic regression models, we find that modest predictors can be trained by utilizing strength and weakness values against other types. For clarification, each species has weighted magnitude values for damage given and damage received from a species of another type. For example, an individual Squirtle (a Water type species) has a value of 0.5 when up against any Fire type species, indicating that the Squirtle would take half the normal damage from but deal double the damage to the Fire type species, before considering additional effects. Our reported f1 scores, precision, and accuracy follow:
<img src="pokemon_graphics/LogReg_Ags.png">

Notice that our models predicted Normal type species with 100% accuracy, but failed to predict any Flying type species. We attribute this disparity to the difference in sample sizes: there are only 3 Flying type species to the 93 Normal type species known in the dataset. Additionally, the type counts are skewed because a species can belong to multiple types, as with nearly any Pokémon belonging to the Dragon type, which is likely to have a secondary type of Flying. We would interested to see how results differ by considering only a subset of types and attributing subset types within the superset (such as with Dragon to Flying), but will leave this for a future project.

Then we find that including primary color and generation (a generalization of when the species was discovered) in addition to the strengths and weakness measurements, the predictiveness of our models generally improve:
<img src="pokemon_graphics/LogReg_Cummulative.png">

Notice that with our previous model the worst f1 score was 0 for Flying types and the average f1 score was 0.842. Notice also that by adding color and generation, our worst f1 score is 0.773 for Ice types, Flying types jump to 0.8, and the average jumps to 0.896. At some point, logistic regression can be prone to overfitting for small datasets, and we wonder if we have started breaching that point. Finally, we admit that by using the intra-type strengths and weaknesses to predict the species type is cheating a little bit because often a species strength or weakness stems directly from their type. In essence, nearly EVERY Water type species is strong against nearly EVERY Fire type species simply because of the natural element. Providing these values to our predictor is not unlike providing to the predictor the target variable directly.

Continuing on, we repeated the same process but training various SVMs and obtained strikingly different results for the fighting stats:
<img src="pokemon_graphics/SVM_FightingStats.png">

These results directly contradict results of other fanbase members that type prediction has poor quality when using only fighting stats, specifically with an average f1 score of 0.99. We are astounded at this result and are considering fault in our training of the model, but have been unable to identify any issue other than the small sample size.

We are pleased to report that predicting on color alone yields better results using SVM than it did with our linear regression model, although these averages are nothing to be impressed with:
<img src="pokemon_graphics/SVM_Colors.png">

We note that the difference between our results for SVM and linear regression are likely due to how we implemented only a OVR-like linear regression model instead of actually implementing an OVR linear regression model. In fact, by training a linear SVM with these same attributes, the sklearn package seemed to hang.

Finally, we report that, as with linear regression, training the SVM model on intra-type strengths and weaknesses yield good models (in fact better than linear regression alone) and that by including more attributes improves predictive results, even to the point of overfitting:
<img src="pokemon_graphics/SVM_Ags.png">
<img src="pokemon_graphics/SVM_Cummulative1.png">
<img src="pokemon_graphics/SVM_Cummulative2.png">

In conclusion, we find that predicting Pokémon type from measurements and observations can be accomplished, and can be accomplished well with the correct attributes, but question the legitimacy due to the relative small sample size. We also find that previous work has likely overlooked the use of an SVM (or that we made some error in the training of ours). Finally, while we find that the type classification can be backed by predictiveness, we still dislike the current topology from a zoologic standpoint. Perhaps as more Pokémon species are discovered, the scientists and professors will recognize the limitations of the current system and will revise the topology; until then "Gotta Catch 'em All!"


%%latex
\newpage

# Project 7 Code


## Import Data

In [None]:
import matplotlib.pyplot as plt

In [2]:
import numpy as np
import pandas as pd
pokemon1 = pd.read_csv('data/pokemon.csv')
pokemon2 = pd.read_csv('data/pokemon_alopez247.csv')
print(len(pokemon1.pokedex_number))
print(len(pokemon2.Number))

print(pokemon1.columns)
print(pokemon2.columns)
display(pokemon1.head())
display(pokemon2.head())

801
721
Index(['abilities', 'against_bug', 'against_dark', 'against_dragon',
       'against_electric', 'against_fairy', 'against_fight', 'against_fire',
       'against_flying', 'against_ghost', 'against_grass', 'against_ground',
       'against_ice', 'against_normal', 'against_poison', 'against_psychic',
       'against_rock', 'against_steel', 'against_water', 'attack',
       'base_egg_steps', 'base_happiness', 'base_total', 'capture_rate',
       'classfication', 'defense', 'experience_growth', 'height_m', 'hp',
       'japanese_name', 'name', 'percentage_male', 'pokedex_number',
       'sp_attack', 'sp_defense', 'speed', 'type1', 'type2', 'weight_kg',
       'generation', 'is_legendary'],
      dtype='object')
Index(['Number', 'Name', 'Type_1', 'Type_2', 'Total', 'HP', 'Attack',
       'Defense', 'Sp_Atk', 'Sp_Def', 'Speed', 'Generation', 'isLegendary',
       'Color', 'hasGender', 'Pr_Male', 'Egg_Group_1', 'Egg_Group_2',
       'hasMegaEvolution', 'Height_m', 'Weight_kg', 'Catch_

Unnamed: 0,abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,...,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
0,"['Overgrow', 'Chlorophyll']",1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,...,88.1,1,65,65,45,grass,poison,6.9,1,0
1,"['Overgrow', 'Chlorophyll']",1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,...,88.1,2,80,80,60,grass,poison,13.0,1,0
2,"['Overgrow', 'Chlorophyll']",1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,...,88.1,3,122,120,80,grass,poison,100.0,1,0
3,"['Blaze', 'Solar Power']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,...,88.1,4,60,50,65,fire,,8.5,1,0
4,"['Blaze', 'Solar Power']",0.5,1.0,1.0,1.0,0.5,1.0,0.5,1.0,1.0,...,88.1,5,80,65,80,fire,,19.0,1,0


Unnamed: 0,Number,Name,Type_1,Type_2,Total,HP,Attack,Defense,Sp_Atk,Sp_Def,...,Color,hasGender,Pr_Male,Egg_Group_1,Egg_Group_2,hasMegaEvolution,Height_m,Weight_kg,Catch_Rate,Body_Style
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,...,Green,True,0.875,Monster,Grass,False,0.71,6.9,45,quadruped
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,...,Green,True,0.875,Monster,Grass,False,0.99,13.0,45,quadruped
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,...,Green,True,0.875,Monster,Grass,True,2.01,100.0,45,quadruped
3,4,Charmander,Fire,,309,39,52,43,60,50,...,Red,True,0.875,Monster,Dragon,False,0.61,8.5,45,bipedal_tailed
4,5,Charmeleon,Fire,,405,58,64,58,80,65,...,Red,True,0.875,Monster,Dragon,False,1.09,19.0,45,bipedal_tailed


## Data Munging

In [3]:
i = 0
while i < 721:
    p1Index = pokemon1.pokedex_number[i]
    p2Index = pokemon2.Number[i]
    
    if p1Index != i+1 or p2Index != i+1:
        print("Line {}: p1 = {}, p2 = {}".format(i+1,p1Index,p2Index))
        
    i = i + 1
    
#No output indicates that pokemon1 and pokemon2 are 
#aligned with records in numerical order

In [4]:
pokemon = pd.merge(pokemon1, pokemon2, left_on='pokedex_number', right_on='Number')

pokemon['Hp'] = pokemon[['hp','HP']].max(axis=1)
pokemon['Atk'] = pokemon[['attack','Attack']].max(axis=1)
pokemon['Def'] = pokemon[['defense','Defense']].max(axis=1)
pokemon['SP_Atk'] = pokemon[['sp_attack','Sp_Atk']].max(axis=1)
pokemon['SP_Def'] = pokemon[['sp_defense','Sp_Def']].max(axis=1)
pokemon['SP'] = pokemon[['speed','Speed']].max(axis=1)
pokemon['Tot'] = pokemon[['base_total','Total']].max(axis=1)

pokemon['Pr_Male'].fillna(-1, inplace=True)

dropCols = ['abilities','classfication','japanese_name','pokedex_number',
            'type1','type2','Type_2','height_m','weight_kg','generation',
            'name','hp','HP','attack','Attack','defense','Defense','hasGender',
            'sp_attack','Sp_Atk','sp_defense','Sp_Def','speed','Speed',
            'base_total','Total','isLegendary','capture_rate','Body_Style',
            'Egg_Group_1','Egg_Group_2','percentage_male','hasMegaEvolution']
pokemon = pokemon.drop(columns = dropCols)

colDict = {'Hp':'HP', 'SP_Atk':'Sp_Atk', 'SP_Def':'Sp_Def', 'SP':'Speed',
           'against_bug':'Ag_Bug', 'against_dark':'Ag_Dark', 'against_dragon':'Ag_Drag',
           'against_electric':'Ag_Elec', 'against_fairy':'Ag_Fairy', 'against_fight':'Ag_Fight',
           'against_fire':'Ag_Fire', 'against_flying':'Ag_Fly', 'against_ghost':'Ag_Ghost',
           'against_grass':'Ag_Grass', 'against_ground':'Ag_Grou', 'against_ice':'Ag_Ice',
           'against_normal':'Ag_Norm', 'against_poison':'Ag_Pois', 'against_psychic':'Ag_Psy',
           'against_rock':'Ag_Rock', 'against_steel':'Ag_Steel', 'against_water':'Ag_Water',
           'Type_1':'Type','Generation':'Gen','is_legendary':'Is_Legd',
           'base_egg_steps':'Egg_Steps','base_happiness':'Happ',
           'experience_growth':'Exp_Grow'
           }
pokemon = pokemon.rename(index=str, columns=colDict)

pokemon['Is_Green'] = (pokemon.Color == 'Green') * 1
pokemon['Is_Red'] = (pokemon.Color == 'Red') * 1
pokemon['Is_Blue'] = (pokemon.Color == 'Blue') * 1
pokemon['Is_White'] = (pokemon.Color == 'White') * 1
pokemon['Is_Brown'] = (pokemon.Color == 'Brown') * 1
pokemon['Is_Yellow'] = (pokemon.Color == 'Yellow') * 1
pokemon['Is_Purple'] = (pokemon.Color == 'Purple') * 1
pokemon['Is_Pink'] = (pokemon.Color == 'Pink') * 1
pokemon['Is_Grey'] = (pokemon.Color == 'Grey') * 1
pokemon['Is_Black'] = (pokemon.Color == 'Black') * 1

pokemon['Is_Grass'] = (pokemon.Type == 'Grass') * 1
pokemon['Is_Fire'] = (pokemon.Type == 'Fire') * 1
pokemon['Is_Water'] = (pokemon.Type == 'Water') * 1 
pokemon['Is_Bug'] = (pokemon.Type == 'Bug') * 1
pokemon['Is_Normal'] = (pokemon.Type == 'Normal') * 1
pokemon['Is_Poison'] = (pokemon.Type == 'Poison') * 1 
pokemon['Is_Electric'] = (pokemon.Type == 'Electric') * 1
pokemon['Is_Ground'] = (pokemon.Type == 'Ground') * 1
pokemon['Is_Fairy'] = (pokemon.Type == 'Fairy') * 1
pokemon['Is_Fighting'] = (pokemon.Type == 'Fighting') * 1
pokemon['Is_Psychic'] = (pokemon.Type == 'Psychic') * 1
pokemon['Is_Rock'] = (pokemon.Type == 'Rock') * 1
pokemon['Is_Ghost'] = (pokemon.Type == 'Ghost') * 1
pokemon['Is_Ice'] = (pokemon.Type == 'Ice') * 1
pokemon['Is_Dragon'] = (pokemon.Type == 'Dragon') * 1
pokemon['Is_Dark'] = (pokemon.Type == 'Dark') * 1
pokemon['Is_Steel'] = (pokemon.Type == 'Steel') * 1
pokemon['Is_Flying'] = (pokemon.Type == 'Flying') * 1

#print(len(pokemon))
print(pokemon.columns)
#display(pokemon.head())


Index(['Ag_Bug', 'Ag_Dark', 'Ag_Drag', 'Ag_Elec', 'Ag_Fairy', 'Ag_Fight',
       'Ag_Fire', 'Ag_Fly', 'Ag_Ghost', 'Ag_Grass', 'Ag_Grou', 'Ag_Ice',
       'Ag_Norm', 'Ag_Pois', 'Ag_Psy', 'Ag_Rock', 'Ag_Steel', 'Ag_Water',
       'Egg_Steps', 'Happ', 'Exp_Grow', 'Is_Legd', 'Number', 'Name', 'Type',
       'Gen', 'Color', 'Pr_Male', 'Height_m', 'Weight_kg', 'Catch_Rate', 'HP',
       'Atk', 'Def', 'Sp_Atk', 'Sp_Def', 'Speed', 'Tot', 'Is_Green', 'Is_Red',
       'Is_Blue', 'Is_White', 'Is_Brown', 'Is_Yellow', 'Is_Purple', 'Is_Pink',
       'Is_Grey', 'Is_Black', 'Is_Grass', 'Is_Fire', 'Is_Water', 'Is_Bug',
       'Is_Normal', 'Is_Poison', 'Is_Electric', 'Is_Ground', 'Is_Fairy',
       'Is_Fighting', 'Is_Psychic', 'Is_Rock', 'Is_Ghost', 'Is_Ice',
       'Is_Dragon', 'Is_Dark', 'Is_Steel', 'Is_Flying'],
      dtype='object')


## Analysis

In [5]:
print(pokemon.Type.unique())
print(pokemon.Color.unique())

['Grass' 'Fire' 'Water' 'Bug' 'Normal' 'Poison' 'Electric' 'Ground'
 'Fairy' 'Fighting' 'Psychic' 'Rock' 'Ghost' 'Ice' 'Dragon' 'Dark' 'Steel'
 'Flying']
['Green' 'Red' 'Blue' 'White' 'Brown' 'Yellow' 'Purple' 'Pink' 'Grey'
 'Black']


In [6]:
def getMatchingColor(df, color):
    return df[df.Color == color]

def getMatchingType(df, expectedType):
    return df[df.Type == expectedType]

def getMatching(df, color, expectedType):
    return df[(df.Color == color) & (df.Type == expectedType)]
    

matrix = []
allColors = pokemon.Color.unique()
allTypes = pokemon.Type.unique()

for ti in range(len(allTypes)):
    line = []
    for ci in range(len(allColors)):
        line.append(len(getMatching(pokemon, allColors[ci], allTypes[ti])))
    matrix.append(line)
display(allTypes)
display(allColors)
display(matrix)

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying'], dtype=object)

array(['Green', 'Red', 'Blue', 'White', 'Brown', 'Yellow', 'Purple',
       'Pink', 'Grey', 'Black'], dtype=object)

[[41, 1, 5, 3, 6, 3, 0, 4, 3, 0],
 [0, 27, 0, 0, 12, 8, 0, 0, 0, 0],
 [5, 9, 59, 6, 4, 3, 7, 8, 4, 0],
 [7, 16, 4, 4, 3, 12, 6, 0, 8, 3],
 [1, 4, 6, 9, 37, 5, 5, 13, 10, 3],
 [3, 0, 5, 0, 2, 0, 17, 0, 0, 1],
 [1, 3, 6, 4, 0, 16, 0, 1, 3, 2],
 [4, 2, 1, 0, 12, 2, 2, 0, 6, 1],
 [0, 0, 1, 8, 0, 0, 1, 7, 0, 0],
 [1, 2, 4, 2, 6, 2, 2, 0, 6, 0],
 [6, 1, 6, 6, 4, 6, 7, 6, 2, 3],
 [3, 2, 9, 0, 11, 3, 2, 1, 10, 0],
 [0, 0, 0, 1, 4, 1, 8, 0, 0, 9],
 [0, 2, 10, 5, 3, 0, 0, 1, 2, 0],
 [4, 2, 9, 2, 1, 1, 3, 0, 1, 1],
 [0, 4, 5, 1, 2, 1, 3, 0, 4, 8],
 [2, 0, 4, 1, 3, 1, 0, 0, 10, 1],
 [1, 0, 0, 0, 0, 0, 2, 0, 0, 0]]

In [7]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_fscore_support

def runLogisticRegression(attribList, target):
    import warnings
    from sklearn.exceptions import UndefinedMetricWarning
    warnings.filterwarnings("ignore", category=UndefinedMetricWarning)
    
    X = pokemon[attribList]
    y = pokemon[target]
    
    lm = LogisticRegression(solver='lbfgs', max_iter=1000)
    lm.fit(X,y)
    
    y_pred = lm.predict(X)
    
    p,r,f,s = precision_recall_fscore_support(y, y_pred, labels=[1])
    print('{}: \tf = {:.3f}\tp = {:.3f}\tr = {:.3f}'.format(target,f[0],p[0],r[0]))
    
def runLogRegForAllTypes(attribList, name):
    print(name)
    for t in allTypes:
        runLogisticRegression(attribList, 'Is_{}'.format(t))
    print()

In [8]:
from sklearn import svm
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import classification_report

def runSVM(attribList, target, allLabels=False, name="", kernel='rbf', degree=3, gamma=.1):
    print(name)
    
    X = pokemon[attribList]
    y = pokemon[target]
    
    if kernel == 'linear':
        clf = svm.SVC(kernel='linear')
    elif kernel == 'poly':
        clf = svm.SVC(kernel='poly', degree=degree)
    else:
        clf = svm.SVC(kernel='rbf', gamma=gamma)
    clf.fit(X, y)
    
    y_pred = clf.predict(X)
    if allLabels:
        print(classification_report(y, y_pred))
        
    else:
        print(classification_report(y, y_pred, labels=[1]))

In [9]:
allAgs = ['Ag_Bug', 'Ag_Dark', 'Ag_Drag', 'Ag_Elec', 'Ag_Fairy', 'Ag_Fight',
       'Ag_Fire', 'Ag_Fly', 'Ag_Ghost', 'Ag_Grass', 'Ag_Grou', 'Ag_Ice',
       'Ag_Norm', 'Ag_Pois', 'Ag_Psy', 'Ag_Rock', 'Ag_Steel', 'Ag_Water']

typicalStats = ['HP','Atk','Def','Sp_Atk','Sp_Def','Speed','Tot']

atypicalStats = ['Egg_Steps', 'Happ', 'Exp_Grow', 'Is_Legd',
       'Gen', 'Pr_Male', 'Height_m', 'Weight_kg', 'Catch_Rate']

allCols = ['Is_Green', 'Is_Red', 'Is_Blue', 'Is_White', 'Is_Brown', 
           'Is_Yellow', 'Is_Purple', 'Is_Pink','Is_Grey', 'Is_Black']

In [10]:
runLogRegForAllTypes(allAgs, "All Ag_* attributes")
runLogRegForAllTypes(typicalStats, "Typical Stats")
runLogRegForAllTypes(atypicalStats, "Atypical Stats")
runLogRegForAllTypes(allCols, "All Color Boolean Stats")
runLogRegForAllTypes(allAgs + allCols, "Ag_* + Colors")
runLogRegForAllTypes(allAgs + allCols + ['Gen'], "Ag_* + Colors + Generation")

All Ag_* attributes
Is_Grass: 	f = 0.936	p = 0.880	r = 1.000
Is_Fire: 	f = 0.948	p = 0.920	r = 0.979
Is_Water: 	f = 0.940	p = 0.918	r = 0.962
Is_Bug: 	f = 0.927	p = 0.950	r = 0.905
Is_Normal: 	f = 1.000	p = 1.000	r = 1.000
Is_Poison: 	f = 0.808	p = 0.875	r = 0.750
Is_Electric: 	f = 0.901	p = 0.914	r = 0.889
Is_Ground: 	f = 0.825	p = 0.788	r = 0.867
Is_Fairy: 	f = 0.944	p = 0.895	r = 1.000
Is_Fighting: 	f = 0.815	p = 0.759	r = 0.880
Is_Psychic: 	f = 0.939	p = 0.902	r = 0.979
Is_Rock: 	f = 0.841	p = 0.787	r = 0.902
Is_Ghost: 	f = 0.936	p = 0.917	r = 0.957
Is_Ice: 	f = 0.773	p = 0.810	r = 0.739
Is_Dragon: 	f = 0.846	p = 0.786	r = 0.917
Is_Dark: 	f = 0.947	p = 0.931	r = 0.964
Is_Steel: 	f = 0.821	p = 0.941	r = 0.727
Is_Flying: 	f = 0.000	p = 0.000	r = 0.000

Typical Stats
Is_Grass: 	f = 0.000	p = 0.000	r = 0.000
Is_Fire: 	f = 0.000	p = 0.000	r = 0.000
Is_Water: 	f = 0.000	p = 0.000	r = 0.000
Is_Bug: 	f = 0.031	p = 1.000	r = 0.016
Is_Normal: 	f = 0.106	p = 0.300	r = 0.065
Is_Poison: 	f = 0.

In [11]:
runSVM(allAgs, 'Type', True, "All Ag_*")
runSVM(typicalStats, 'Type', True, "Typical Stats")
runSVM(atypicalStats, 'Type', True, "Atypical Stats")
runSVM(allCols, 'Type', True, "All Colors")
runSVM(allAgs + allCols, 'Type', True, "Ag_* + Colors")
runSVM(allAgs + allCols + ['Gen'], 'Type', True, "Ag_* + Colors + Generation")

runSVM(allAgs + typicalStats + atypicalStats + allCols + ['Gen'], 
       'Type', True, "All Attribs")

All Ag_*
              precision    recall  f1-score   support

         Bug       0.98      0.92      0.95        63
        Dark       0.96      0.96      0.96        28
      Dragon       0.86      1.00      0.92        24
    Electric       1.00      0.97      0.99        36
       Fairy       1.00      1.00      1.00        17
    Fighting       0.96      0.92      0.94        25
        Fire       0.98      1.00      0.99        47
      Flying       0.00      0.00      0.00         3
       Ghost       0.88      1.00      0.94        23
       Grass       0.96      1.00      0.98        66
      Ground       0.96      0.80      0.87        30
         Ice       1.00      0.87      0.93        23
      Normal       0.99      0.96      0.97        93
      Poison       1.00      0.93      0.96        28
     Psychic       1.00      0.94      0.97        47
        Rock       0.76      1.00      0.86        41
       Steel       0.86      0.86      0.86        22
       Water      