# PokéData: A Data-Driven Victory

__Context:__ Professor Oak's hypothesis is that pokémon have been getting stronger with each generation however some of his colleagues seem to disagree. With the pokémon championship battle coming closer, he wants to use this event to test his theory. He tasks a promising young trainer to fight the undefeated champion Cynthia with a carefully constructed team of 6 pokémon and bring an end to her win streak. 

__Objective:__ Create a team of 6 non-legendary pokémon that can defeat Cynthia's team and analyse trends in the data to in order to assess professor Oak's hypothesis.

## Setup: Cleaning the Data

In [137]:
# import libraries

import pandas as pd
import sqlite3

In [138]:
# loading dataset

df = pd.read_csv('C:/Users/danny/Documents/Projects/PokeData/MainStats.csv')
df.tail()

Unnamed: 0,ID,Name,Total,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,Height,Weight
1204,1021,Raging Bolt,590,125,73,91,137,89,75,Electric,Dragon,,
1205,1022,Iron Boulder,590,90,120,80,68,108,124,Rock,Psychic,,
1206,1023,Iron Crown,590,90,72,100,122,108,98,Steel,Psychic,,
1207,1024,Terapagos,450,90,65,85,65,85,60,Normal,,,
1208,1025,Pecharunt,600,88,88,160,88,88,88,Poison,Ghost,,


This data is missing some key information that we will need such as the generation of each Pokémon and their legendary status. To fix this, we need to find the relevant data and append it to the above dataset.

In [139]:
# loading other dataset

df2 = pd.read_csv('C:/Users/danny/Documents/Projects/PokeData/SecondaryStats.csv')
df2.tail()

Unnamed: 0,number,name,type1,type2,total,hp,attack,defense,sp_attack,sp_defense,speed,generation,legendary
1067,896,Glastrier,Ice,,580,100,145,130,65,110,30,8,True
1068,897,Spectrier,Ghost,,580,100,65,60,145,80,130,8,True
1069,898,Calyrex,Psychic,Grass,500,100,80,80,80,80,80,8,True
1070,898,Ice Rider Calyrex,Psychic,Ice,680,100,165,150,85,130,50,8,True
1071,898,Shadow Rider Calyrex,Psychic,Ghost,680,100,85,80,165,100,150,8,True


In [140]:
# connecting to database and creating SQL tables

cnn = sqlite3.connect(':memory:')

df.to_sql('main_stats', cnn)
df2.to_sql('secondary_stats', cnn)

1072

In [141]:
# join tables 

query = '''
SELECT main_stats.ID, main_stats.Name, main_stats.HP, main_stats.Attack, main_stats.Defense, main_stats.SpAtk, main_stats.SpDef, main_stats.Speed,
main_stats.Type1, main_stats.Type2, secondary_stats.generation, secondary_stats.legendary
FROM main_stats
LEFT JOIN secondary_stats
ON main_stats.ID = secondary_stats.number;
'''

result = pd.read_sql_query(query, con=cnn)
result.tail()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
1603,1021,Raging Bolt,125,73,91,137,89,75,Electric,Dragon,,
1604,1022,Iron Boulder,90,120,80,68,108,124,Rock,Psychic,,
1605,1023,Iron Crown,90,72,100,122,108,98,Steel,Psychic,,
1606,1024,Terapagos,90,65,85,65,85,60,Normal,,,
1607,1025,Pecharunt,88,88,160,88,88,88,Poison,Ghost,,


The number of entries in both tables is different meaning that the generation and legendary columns in the result dataset contain missing values. Lets fix this.

In [142]:
# update tables to show generation and legendary status for all null values

result.to_sql('result', cnn)

query = '''
UPDATE result
SET legendary = True
WHERE ID IN (1001, 1002, 1003, 1004, 1007, 1008, 1014, 1015, 1016, 1017, 1024);
'''

query2 = '''
UPDATE result
SET legendary = False
WHERE legendary IS NULL
'''

# the first pokémon in generation 9 has an ID of 899
query3 = '''
UPDATE result
SET generation = 9
WHERE ID > 898
'''

cnn.execute(query)
cnn.execute(query2)
cnn.execute(query3)
cnn.commit()

test_query = '''
SELECT *
FROM result;
'''

filtered_result = pd.read_sql_query(test_query, con=cnn)
filtered_result.tail()

Unnamed: 0,index,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
1603,1603,1021,Raging Bolt,125,73,91,137,89,75,Electric,Dragon,9.0,0.0
1604,1604,1022,Iron Boulder,90,120,80,68,108,124,Rock,Psychic,9.0,0.0
1605,1605,1023,Iron Crown,90,72,100,122,108,98,Steel,Psychic,9.0,0.0
1606,1606,1024,Terapagos,90,65,85,65,85,60,Normal,,9.0,1.0
1607,1607,1025,Pecharunt,88,88,160,88,88,88,Poison,Ghost,9.0,0.0


In [143]:
# deleting duplicate values

delete = '''
DELETE FROM result
WHERE rowid NOT IN (
    SELECT MIN(rowid)
    FROM result
    GROUP BY ID
);
'''

cnn.execute(delete)
cnn.commit()

test_delete = '''
SELECT *
FROM result;
'''

delete_result = pd.read_sql_query(test_delete, con=cnn)
delete_result.tail()

Unnamed: 0,index,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
1020,1603,1021,Raging Bolt,125,73,91,137,89,75,Electric,Dragon,9.0,0.0
1021,1604,1022,Iron Boulder,90,120,80,68,108,124,Rock,Psychic,9.0,0.0
1022,1605,1023,Iron Crown,90,72,100,122,108,98,Steel,Psychic,9.0,0.0
1023,1606,1024,Terapagos,90,65,85,65,85,60,Normal,,9.0,1.0
1024,1607,1025,Pecharunt,88,88,160,88,88,88,Poison,Ghost,9.0,0.0


We can see now that the number of rows is 1025 which matches the number of pokémon that exist. There are no null values and all the columns have information that we need.