# PokéData: A Data-Driven Victory

__Context:__ Professor Oak's hypothesis is that pokémon have been getting stronger with each generation however some of his colleagues seem to disagree. With the pokémon championship battle coming closer, he wants to use this event to test his theory. He tasks a promising young trainer to fight the undefeated champion Cynthia with a carefully constructed team of 6 pokémon and bring an end to her win streak. 

<div style="text-align: center;">
    <img src="Images/cynthia_team.png" alt="cynthia team" width="800">
</div>

__Objective:__ Create a team of 6 non-legendary pokémon that can defeat Cynthia's team and analyse trends in the data to in order to assess professor Oak's hypothesis.

## Setup: Cleaning the Data

In [1]:
# import libraries

import pandas as pd
import sqlite3

In [2]:
# loading dataset

df = pd.read_csv('C:/Users/danny/Documents/Projects/PokeData/MainStats.csv')
df.tail()

Unnamed: 0,ID,Name,Total,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,Height,Weight
1204,1021,Raging Bolt,590,125,73,91,137,89,75,Electric,Dragon,,
1205,1022,Iron Boulder,590,90,120,80,68,108,124,Rock,Psychic,,
1206,1023,Iron Crown,590,90,72,100,122,108,98,Steel,Psychic,,
1207,1024,Terapagos,450,90,65,85,65,85,60,Normal,,,
1208,1025,Pecharunt,600,88,88,160,88,88,88,Poison,Ghost,,


This data is missing some key information that we will need such as the generation of each Pokémon and their legendary status. To fix this, we need to find the relevant data and append it to the above dataset.

In [3]:
# loading other dataset

df2 = pd.read_csv('C:/Users/danny/Documents/Projects/PokeData/SecondaryStats.csv')
df2.tail()

Unnamed: 0,number,name,type1,type2,total,hp,attack,defense,sp_attack,sp_defense,speed,generation,legendary
1067,896,Glastrier,Ice,,580,100,145,130,65,110,30,8,True
1068,897,Spectrier,Ghost,,580,100,65,60,145,80,130,8,True
1069,898,Calyrex,Psychic,Grass,500,100,80,80,80,80,80,8,True
1070,898,Ice Rider Calyrex,Psychic,Ice,680,100,165,150,85,130,50,8,True
1071,898,Shadow Rider Calyrex,Psychic,Ghost,680,100,85,80,165,100,150,8,True


In [4]:
# connecting to database and creating SQL tables

cnn = sqlite3.connect(':memory:')

df.to_sql('main_stats', cnn)
df2.to_sql('secondary_stats', cnn)

1072

In [5]:
# join tables 

query = '''
SELECT main_stats.ID, main_stats.Name, main_stats.HP, main_stats.Attack, main_stats.Defense, main_stats.SpAtk, main_stats.SpDef, main_stats.Speed,
main_stats.Type1, main_stats.Type2, secondary_stats.generation, secondary_stats.legendary
FROM main_stats
LEFT JOIN secondary_stats
ON main_stats.ID = secondary_stats.number;
'''

result = pd.read_sql_query(query, con=cnn)
result.tail()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
1603,1021,Raging Bolt,125,73,91,137,89,75,Electric,Dragon,,
1604,1022,Iron Boulder,90,120,80,68,108,124,Rock,Psychic,,
1605,1023,Iron Crown,90,72,100,122,108,98,Steel,Psychic,,
1606,1024,Terapagos,90,65,85,65,85,60,Normal,,,
1607,1025,Pecharunt,88,88,160,88,88,88,Poison,Ghost,,


The number of entries in both tables is different meaning that the generation and legendary columns in the result dataset contain missing values. Lets fix this.

In [6]:
# update tables to show generation and legendary status for all null values

result.to_sql('result', cnn)

query = '''
UPDATE result
SET legendary = True
WHERE ID IN (1001, 1002, 1003, 1004, 1007, 1008, 1014, 1015, 1016, 1017, 1024);
'''

query2 = '''
UPDATE result
SET legendary = False
WHERE legendary IS NULL
'''

# the first pokémon in generation 9 has an ID of 899
query3 = '''
UPDATE result
SET generation = 9
WHERE ID > 898
'''

cnn.execute(query)
cnn.execute(query2)
cnn.execute(query3)
cnn.commit()

test_query = '''
SELECT *
FROM result;
'''

filtered_result = pd.read_sql_query(test_query, con=cnn)
filtered_result.tail()

Unnamed: 0,index,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
1603,1603,1021,Raging Bolt,125,73,91,137,89,75,Electric,Dragon,9.0,0.0
1604,1604,1022,Iron Boulder,90,120,80,68,108,124,Rock,Psychic,9.0,0.0
1605,1605,1023,Iron Crown,90,72,100,122,108,98,Steel,Psychic,9.0,0.0
1606,1606,1024,Terapagos,90,65,85,65,85,60,Normal,,9.0,1.0
1607,1607,1025,Pecharunt,88,88,160,88,88,88,Poison,Ghost,9.0,0.0


In [7]:
# deleting duplicate values

delete = '''
DELETE FROM result
WHERE rowid NOT IN (
    SELECT MIN(rowid)
    FROM result
    GROUP BY ID
);
'''

cnn.execute(delete)
cnn.commit()

test_delete = '''
SELECT *
FROM result;
'''

pokemon = pd.read_sql_query(test_delete, con=cnn)
pokemon.to_sql('pokemon', cnn, if_exists='replace', index=False)
pokemon.tail()

Unnamed: 0,index,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
1020,1603,1021,Raging Bolt,125,73,91,137,89,75,Electric,Dragon,9.0,0.0
1021,1604,1022,Iron Boulder,90,120,80,68,108,124,Rock,Psychic,9.0,0.0
1022,1605,1023,Iron Crown,90,72,100,122,108,98,Steel,Psychic,9.0,0.0
1023,1606,1024,Terapagos,90,65,85,65,85,60,Normal,,9.0,1.0
1024,1607,1025,Pecharunt,88,88,160,88,88,88,Poison,Ghost,9.0,0.0


We can see now that the number of rows is 1025 which matches the number of pokémon that exist. There are no null values and all the columns have information that we need. The data is now cleaned and ready for us to work with.

## Intro: Querying the Data

In [8]:
# number of pokemon in each generation

query = '''
SELECT COUNT(ID) as pokemon_count, generation
FROM pokemon
GROUP BY generation;
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
print(result)

   pokemon_count  generation
0              2         0.0
1            151         1.0
2            100         2.0
3            135         3.0
4            107         4.0
5            156         5.0
6             72         6.0
7             86         7.0
8             89         8.0
9            127         9.0


We have found some errors in the data. 2 pokemon are classified as generation 0 which doesn't exist. Also, generations 8 and 9 do not have the correct count. Lets fix this.

In [9]:
# finding the error

query = '''
SELECT ID, Name
FROM pokemon
WHERE generation = 0;
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name
0,808,Meltan
1,809,Melmetal


In [10]:
# fixing the problem

query = '''
UPDATE pokemon
SET generation = 7
WHERE ID IN (808, 809)
'''

cnn.execute(query)
cnn.commit()

test_query = '''
SELECT ID, Name, generation
FROM pokemon
WHERE ID IN (808, 809)
'''

filtered_result = pd.read_sql_query(test_query, con=cnn)
filtered_result.tail()

Unnamed: 0,ID,Name,generation
0,808,Meltan,7.0
1,809,Melmetal,7.0


In [11]:
# finding the error

query = '''
SELECT ID, Name, generation
FROM pokemon
WHERE generation = 9
LIMIT 10;
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
print(result)

    ID                      Name  generation
0  899                   Wyrdeer         9.0
1  900                   Kleavor         9.0
2  901                  Ursaluna         9.0
3  902          Basculegion Male         9.0
4  903                  Sneasler         9.0
5  904                  Overqwil         9.0
6  905  Enamorus Incarnate Forme         9.0
7  906                Sprigatito         9.0
8  907                 Floragato         9.0
9  908               Meowscarada         9.0


In [12]:
# fixing the error
# the pokemon with id ranging from 899 to 905 are classified as generation 9 rather than generation 8

query = '''
UPDATE pokemon
SET generation = 8
WHERE ID >= 899 AND ID <= 905
'''

cnn.execute(query)
cnn.commit()

test_query = '''
SELECT ID, Name, generation
FROM pokemon
WHERE ID >= 899 AND ID <= 905
'''

filtered_result = pd.read_sql_query(test_query, con=cnn)
print(filtered_result)

    ID                      Name  generation
0  899                   Wyrdeer         8.0
1  900                   Kleavor         8.0
2  901                  Ursaluna         8.0
3  902          Basculegion Male         8.0
4  903                  Sneasler         8.0
5  904                  Overqwil         8.0
6  905  Enamorus Incarnate Forme         8.0


In [13]:
# rerunning the query to find count of all pokemon per generation

query = '''
SELECT COUNT(ID) AS pokemon_count, generation
FROM pokemon
GROUP BY generation;
'''

result = pd.read_sql_query(query, con=cnn)
print(result)

# save cleaned data to use with power BI
pokemon_df = pd.read_sql_query('SELECT * FROM pokemon', con=cnn)
pokemon_df.to_csv('cleaned_data.csv', index=False)

   pokemon_count  generation
0            151         1.0
1            100         2.0
2            135         3.0
3            107         4.0
4            156         5.0
5             72         6.0
6             88         7.0
7             96         8.0
8            120         9.0


In [14]:
# Create a new table without the unwanted column
cnn.execute('''
CREATE TABLE temp_pokemon AS
SELECT ID, Name, HP, Attack, Defense, SpAtk, SpDef, Speed, Type1, Type2, generation, legendary
FROM pokemon;
''')

# Drop the old pokemon table
cnn.execute('DROP TABLE pokemon;')

# Rename the new table
cnn.execute('ALTER TABLE temp_pokemon RENAME TO pokemon;')

# Commit the changes
cnn.commit()

pokemon.head()

Unnamed: 0,index,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,0,1,Bulbasaur,45,49,49,65,65,45,Grass,Poison,1.0,0.0
1,1,2,Ivysaur,60,62,63,80,80,60,Grass,Poison,1.0,0.0
2,2,3,Venusaur,80,82,83,100,100,80,Grass,Poison,1.0,0.0
3,8,4,Charmander,39,52,43,60,50,65,Fire,,1.0,0.0
4,9,5,Charmeleon,58,64,58,80,65,80,Fire,,1.0,0.0


In [15]:
# finding the count of pokemon per type

query = '''
WITH CombinedTypes AS (
    SELECT Type1 AS type
    FROM pokemon
    UNION ALL
    SELECT Type2 AS type
    FROM pokemon
    WHERE Type2 IS NOT NULL
)

SELECT type, COUNT(*) AS pokemon_count
FROM CombinedTypes
GROUP BY type;
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
print(result)

        type  pokemon_count
0        Bug             92
1       Dark             69
2     Dragon             70
3   Electric             69
4      Fairy             64
5   Fighting             73
6       Fire             82
7     Flying            109
8      Ghost             65
9      Grass            127
10    Ground             75
11       Ice             48
12    Normal            131
13    Poison             83
14   Psychic            102
15      Rock             74
16     Steel             65
17     Water            154


In [16]:
# finding the average total stats for each generation

query = '''
SELECT 
    generation, 
    ROUND(AVG(total_stats), 2) AS avg_total_stats
FROM (
    SELECT 
        generation, 
        (HP + Attack + Defense + SpAtk + SpDef + Speed) AS total_stats
    FROM 
        pokemon
) AS stats_per_pokemon
GROUP BY 
    generation;
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
print(result)

   generation  avg_total_stats
0         1.0           407.64
1         2.0           407.18
2         3.0           403.73
3         4.0           445.57
4         5.0           425.76
5         6.0           429.31
6         7.0           449.41
7         8.0           439.22
8         9.0           457.39


In [17]:
# average of each stat per generation

query = '''
SELECT generation, ROUND(AVG(HP) , 2) AS avg_hp, ROUND(AVG(Attack) , 2) AS avg_attack, ROUND(AVG(Defense) , 2) AS avg_defense,
ROUND(AVG(SpAtk) , 2) AS avg_spatk, ROUND(AVG(SpDef) , 2) AS avg_spdef, ROUND(AVG(Speed), 2) AS avg_speed 
FROM pokemon
GROUP BY generation;
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
print(result)

   generation  avg_hp  avg_attack  avg_defense  avg_spatk  avg_spdef  \
0         1.0   64.21       72.91        68.23      67.14      66.09   
1         2.0   70.98       68.26        69.69      64.50      72.34   
2         3.0   65.67       73.11        69.01      67.86      66.47   
3         4.0   73.10       80.21        75.11      73.28      74.38   
4         5.0   70.31       81.03        71.24      69.24      67.33   
5         6.0   68.92       72.50        75.08      72.54      74.58   
6         7.0   71.01       84.77        78.73      74.95      74.57   
7         8.0   72.82       82.91        73.28      71.65      69.61   
8         9.0   77.39       82.43        76.78      72.87      72.48   

   avg_speed  
0      69.07  
1      61.41  
2      61.61  
3      69.48  
4      66.60  
5      65.68  
6      65.38  
7      68.95  
8      75.45  


In [18]:
# data visualiation via power bi

from IPython.display import IFrame
IFrame(src="https://app.powerbi.com/reportEmbed?reportId=3bec943c-c6ee-4d34-afe1-bec9c11c5122&autoAuth=true&ctid=2efd699a-1922-4e69-b601-108008d28a2e",
       height = 636, width = 1300)

<div style="text-align: center;">
    <img src="Images/intro.png" alt="intro" width="1200">
</div>

We can see from the data that there is no trend in the number of pokémon over time. The pie chart shows that generation 5 has the most pokémon with 156 while generation 6 had the least with just 72.

If we look at the average total stats per generation, we can see that there is a positive correlation which shows that pokémon do seem to be getting stronger from generation to generation. Generation 1 had an average of 407.64 while generation 9 had an average of 457.39. The first 3 generations show very little discrepancy with generation 3 showing the lowest average stat total of 403.73. Generation 9 on the other hand had the highest average.

We can also see how each stat's average has changed over time. There is a general increase from generation 1 to 9. The hp stat has seen the most major increase of 13.18 (64.21 to 77.39) with the attack stat being second with an increase of 9.52. Defense was third with an increase of 8.55. The other stats only increased by 5 to 7 points. You may argue that the 'strength' of a pokémon is only determined by its attack or special attack stats however it's a bit more complex than that. Almost all the stats are important for a pokémon to be considered strong. There is no point in having a high attack stat if the defense and speed stats are low. The pokémon won't be able to even hit them! High total stats are important to truly be considered strong.

Stats alone however, don't necessarily prove professor Oak's hypothesis as there is a second determining factor, that is the move pool of a pokémon (i.e the strength of the moves that a pokémon can learn). While we don't have access to move data, we can make an estimate based on the typing and also do external research to see what moves the pokémon we select can learn.

We can see that the number of pokémon of each type is different with water types being the most common with 154! Whereas ice types are the least common with only 48... That's over 3 times less. 

## Choosing The Team

To select a team to defeat Cynthia, we must first examine all of her pokémon individually to find counters to them. The criteria for each pokémon are:

- The total stats must be at least equal to Cynthia's pokémon
- Cannot be a legendary 
- The pokémon's type must be super effective (or neutral if no weaknesses)
- The pokémon's type must not be weak to Cynthia's pokémon's move types
- Attack stat must be higher than enemy defense or SpAtk must be higher than enemy SpDef
- The chosen pokémon cannot be one of Cynthia's pokémon

### Spiritomb

<img src="Images/spiritomb.png" alt="Spiritomb" width="300"/>

Her first pokémon is Spiritomb, a ghost/dark type. What's interesting about this type combination is that it had no weaknesses until generation 6 which introduced fairy types. Ghost types are weak to dark and ghost type moves however dark types resist both dark and ghost meaning they cancelled out. Here is a graphic of the type matchup chart to make things more clear: 

<div style="text-align: center;">
    <img src="Images/type_chart.png" alt="Type Chart" width="600">
</div>

In [19]:
# Spiritomb stats

query = '''
SELECT *
FROM pokemon
WHERE name = 'Spiritomb'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,442,Spiritomb,50,92,108,92,108,35,Ghost,Dark,4.0,0.0


From Spiritomb's information we can see that the total stats is equal to 485. Lets filter the table to satisfy all criteria.

In [20]:
query = '''
SELECT ID, Name, (HP + Attack + Defense + SpAtk + SpDef + Speed) AS total_stats, Type1, Type2, generation, legendary
FROM pokemon
WHERE total_stats >= 485 
AND type1 NOT IN ('Psychic', 'Ghost', 'Fighting', 'Poison', 'Grass', 'Dark')
AND type2 NOT IN ('Psychic', 'Ghost', 'Fighting', 'Poison', 'Grass', 'Dark')
AND (type1 = 'Fairy' OR type2 = 'Fairy')
AND legendary = 0
AND Name NOT IN ('Spiritomb', 'Roserade', 'Togekiss', 'Lucario', 'Milotic', 'Garchomp')
AND (Attack > 108 OR SpAtk > 108); 
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,total_stats,Type1,Type2,generation,legendary
0,730,Primarina,530,Water,Fairy,7.0,0.0
1,905,Enamorus Incarnate Forme,580,Fairy,Flying,8.0,0.0


After filtering the data we can see there are 2 different options. However there is an error in the data. Enamorus is actually a legendary pokémon but that isn't reflected in the data. Therefore the only pokémon that satisfies the criteria is Primarina.

In [21]:
# Primarina's data

query = '''
SELECT *
FROM pokemon
WHERE name = 'Primarina'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,730,Primarina,80,74,74,126,116,60,Water,Fairy,7.0,0.0


### Roserade

<img src="Images/roserade.png" alt="Roserade" width="300"/>

Roserade is her next pokémon, a grass/poison type. This typing is weak to flying, fire, psychic and ice. It has grass, poison and psychic type moves which are super effective against the following types: ground, rock, water, grass, fairy, fighting, poison.

In [22]:
# Roserade stats

query = '''
SELECT *
FROM pokemon
WHERE name = 'Roserade'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,407,Roserade,60,70,65,125,105,90,Grass,Poison,4.0,0.0


Roserade's total stats amount to 515

In [28]:
query = '''
SELECT ID, Name, (HP + Attack + Defense + SpAtk + SpDef + Speed) AS total_stats, Type1, Type2, generation, legendary
FROM pokemon
WHERE total_stats >= 515 
AND type1 NOT IN ('Ground', 'Rock', 'Water', 'Grass', 'Fairy', 'Fighting', 'Poison')
AND type2 NOT IN ('Ground', 'Rock', 'Water', 'Grass', 'Fairy', 'Fighting', 'Poison')
AND (type1 IN ('Flying', 'Fire', 'Psychic', 'Ice') OR type2 IN ('Flying', 'Fire', 'Psychic', 'Ice'))
AND legendary = 0
AND Name NOT IN ('Spiritomb', 'Roserade', 'Togekiss', 'Lucario', 'Milotic', 'Garchomp')
AND (Attack > 65 OR SpAtk > 105)
ORDER BY total_stats DESC; 
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
print(result)

      ID          Name  total_stats   Type1    Type2  generation  legendary
0    149     Dragonite          600  Dragon   Flying         1.0        0.0
1    373     Salamence          600  Dragon   Flying         3.0        0.0
2    376     Metagross          600   Steel  Psychic         3.0        0.0
3    998    Baxcalibur          600  Dragon      Ice         9.0        0.0
4   1020  Gouging Fire          590    Fire   Dragon         9.0        0.0
5   1023    Iron Crown          590   Steel  Psychic         9.0        0.0
6    797    Celesteela          570   Steel   Flying         7.0        0.0
7    806   Blacephalon          570    Fire    Ghost         7.0        0.0
8    993  Iron Jugulis          570    Dark   Flying         9.0        0.0
9    637     Volcarona          550     Bug     Fire         5.0        0.0
10   715       Noivern          535  Flying   Dragon         6.0        0.0
11     6     Charizard          534    Fire   Flying         1.0        0.0
12   655    

There are 22 valid pokemon to choose from. The pokemon with the highest total stats are Dragonite, Salamence, Metagross and Baxcalibur who all have a stat total of 600. To choose between them we have to compare their typing. The pokemon that resits Roserade's moves the best will be chosen.

__Case 1:__ Dragonite

Both dragon and flying type resists grass type moves (0.25x)

__Case 2:__ Salamence

Same typing as Dragonite

__Case 3:__ Metagross

Poison does not affect steel types (x0)
Steel resists grass (x0.5)
Steel and psychic both resist psychic(x0.25)

__Case 4:__ Baxcalibur

Dragon resists grass (x0.5)

By comparing these we can see Metagross is the clear winner as it has immunity to poison and resists both of Roserade's other move types.

In [29]:
# Metagross's data

query = '''
SELECT *
FROM pokemon
WHERE name = 'Metagross'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,376,Metagross,80,135,130,95,90,70,Steel,Psychic,3.0,0.0


### Togekiss

<img src="Images/togekiss.png" alt="Togekiss" width="300"/>

Togekiss is a normal/flying type. This typing is weak to rock, electric and ice type. Its moves are flying, fighting, water and electric which are super effective against 12 different types!! These are fighting, bug, grass, normal, rock, steel, ice, dark, ground, fire, flying, water.

In [31]:
# Togekiss' stats

query = '''
SELECT *
FROM pokemon
WHERE name = 'Togekiss'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,468,Togekiss,85,50,95,120,115,80,Fairy,Flying,4.0,0.0


Total stats = 545

In [33]:
query = '''
SELECT ID, Name, (HP + Attack + Defense + SpAtk + SpDef + Speed) AS total_stats, Type1, Type2, generation, legendary
FROM pokemon
WHERE total_stats >= 545 
AND type1 NOT IN ('Ground', 'Rock', 'Water', 'Grass', 'Bug', 'Fighting', 'Normal', 'Steel', 'Ice', 'Dark', 'Fire', 'Flying')
AND type2 NOT IN ('Ground', 'Rock', 'Water', 'Grass', 'Bug', 'Fighting', 'Normal', 'Steel', 'Ice', 'Dark', 'Fire', 'Flying')
AND (type1 IN ('Rock', 'Electric', 'Ice') OR type2 IN ('Rock', 'Electric', 'Ice'))
AND legendary = 0
AND Name NOT IN ('Spiritomb', 'Roserade', 'Togekiss', 'Lucario', 'Milotic', 'Garchomp')
AND (Attack > 95 OR SpAtk > 115)
ORDER BY total_stats DESC; 
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,total_stats,Type1,Type2,generation,legendary
0,1021,Raging Bolt,590,Electric,Dragon,9.0,0.0


Raging Bolt is the only pokemon that fits the criteria.

In [34]:
# Raging Bolt's data

query = '''
SELECT *
FROM pokemon
WHERE name = 'Raging Bolt'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,1021,Raging Bolt,125,73,91,137,89,75,Electric,Dragon,9.0,0.0


### Lucario

<img src="Images/lucario.png" alt="Lucario" width="300"/>

Lucario is a fighting/steel type pokemon. It is weak to fighting, ground and fire while its moves are fighting, normal, ghost and rock. These moves are strong against normal, rock, steel, ice, dark, ghost, psychic, flying, bug and fire.

In [35]:
# Lucario's stats

query = '''
SELECT *
FROM pokemon
WHERE name = 'Lucario'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,448,Lucario,70,110,70,115,70,90,Fighting,Steel,4.0,0.0


Total stats = 525

In [36]:
query = '''
SELECT ID, Name, (HP + Attack + Defense + SpAtk + SpDef + Speed) AS total_stats, Type1, Type2, generation, legendary
FROM pokemon
WHERE total_stats >= 525 
AND type1 NOT IN ('Normal', 'Rock', 'Steel', 'Ice', 'Dark', 'Ghost', 'Psychic', 'Flying', 'Bug' and 'Fire')
AND type2 NOT IN ('Normal', 'Rock', 'Steel', 'Ice', 'Dark', 'Ghost', 'Psychic', 'Flying', 'Bug' and 'Fire')
AND (type1 IN ('Fighting', 'Ground', 'Fire') OR type2 IN ('Fighting', 'Ground', 'Fire'))
AND legendary = 0
AND Name NOT IN ('Spiritomb', 'Roserade', 'Togekiss', 'Lucario', 'Milotic', 'Garchomp')
AND (Attack > 70 OR SpAtk > 70)
ORDER BY total_stats DESC; 
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
print(result)

      ID          Name  total_stats     Type1     Type2  generation  legendary
0    784       Kommo-o          600    Dragon  Fighting         7.0        0.0
1   1006  Iron Valiant          590     Fairy  Fighting         9.0        0.0
2   1020  Gouging Fire          590      Fire    Dragon         9.0        0.0
3    794      Buzzwole          570       Bug  Fighting         7.0        0.0
4    795     Pheromosa          570       Bug  Fighting         7.0        0.0
5    984    Great Tusk          570    Ground  Fighting         9.0        0.0
6    988  Slither Wing          570       Bug  Fighting         9.0        0.0
7    989  Sandy Shocks          570  Electric    Ground         9.0        0.0
8    992    Iron Hands          570  Fighting  Electric         9.0        0.0
9    994     Iron Moth          570      Fire    Poison         9.0        0.0
10   637     Volcarona          550       Bug      Fire         5.0        0.0
11   260      Swampert          535     Water    Gro

There are 18 different pokemon that fit the criteria however one of them has a higher stat point total than the others, Kommo-o with 600.

In [37]:
# Kommo-o's data

query = '''
SELECT *
FROM pokemon
WHERE name = 'Kommo-o'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,784,Kommo-o,75,110,125,100,105,85,Dragon,Fighting,7.0,0.0


### Milotic

<img src="Images/milotic.png" alt="Milotic" width="300"/>

Milotic is a water type pokemon with only 2 weaknesses, grass and electric. Its moves are water, ice, psychic and dragon type. These are strong against: ground, rock, fire, flying, grass, dragon, fighting, poison.

In [39]:
# Milotic's stats

query = '''
SELECT *
FROM pokemon
WHERE name = 'Milotic'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,350,Milotic,95,60,79,100,125,81,Water,,3.0,0.0


Total stats = 540

In [51]:
query = '''
SELECT ID, Name, (HP + Attack + Defense + SpAtk + SpDef + Speed) AS total_stats, Type1, Type2, generation, legendary
FROM pokemon
WHERE total_stats >= 540 
AND type1 NOT IN ('Rock', 'Flying', 'Ground', 'Fire', 'Grass', 'Dragon', 'Fighting', 'Poison')
AND type2 NOT IN ('Rock', 'Flying', 'Ground', 'Fire', 'Grass', 'Dragon', 'Fighting', 'Poison')
AND (type1 IN ('Grass', 'Electric') OR type2 IN ('Grass', 'Electric'))
AND legendary = 0
AND Name NOT IN ('Spiritomb', 'Roserade', 'Togekiss', 'Lucario', 'Milotic', 'Garchomp')
AND (Attack > 79 OR SpAtk > 125)
ORDER BY total_stats DESC; 
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
print(result)

Empty DataFrame
Columns: [ID, Name, total_stats, Type1, Type2, generation, legendary]
Index: []


There are no pokemon that fit every criteria therefore we can remove the total stat restriction to see the highest stat pokemon that fits all other criteria.

In [52]:
query = '''
SELECT ID, Name, (HP + Attack + Defense + SpAtk + SpDef + Speed) AS total_stats, Type1, Type2, generation, legendary
FROM pokemon
WHERE type1 NOT IN ('Rock', 'Flying', 'Ground', 'Fire', 'Grass', 'Dragon', 'Fighting', 'Poison')
AND type2 NOT IN ('Rock', 'Flying', 'Ground', 'Fire', 'Grass', 'Dragon', 'Fighting', 'Poison')
AND (type1 IN ('Grass', 'Electric') OR type2 IN ('Grass', 'Electric'))
AND legendary = 0
AND Name NOT IN ('Spiritomb', 'Roserade', 'Togekiss', 'Lucario', 'Milotic', 'Garchomp')
AND (Attack > 79 OR SpAtk > 125)
ORDER BY total_stats DESC; 
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
print(result)

    ID                     Name  total_stats     Type1     Type2  generation  \
0  462                Magnezone          535  Electric     Steel         4.0   
1  881                Arctozolt          505  Electric       Ice         8.0   
2  738                 Vikavolt          500       Bug  Electric         7.0   
3  877  Morpeko Full Belly Mode          436  Electric      Dark         8.0   
4  777               Togedemaru          435  Electric     Steel         7.0   
5  737                Charjabug          400       Bug  Electric         7.0   

   legendary  
0        0.0  
1        0.0  
2        0.0  
3        0.0  
4        0.0  
5        0.0  


Magnezone has the highest stat total with 535 which is just 5 points below Milotic.

In [53]:
# Magnezone's data

query = '''
SELECT *
FROM pokemon
WHERE name = 'Magnezone'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,462,Magnezone,70,70,115,130,90,60,Electric,Steel,4.0,0.0


### Garchomp

<img src="Images/garchomp.png" alt="Garchomp" width="300"/>

Finally, we have Garchomp, Cynthia's ace pokemon. It is a dragon/ground type which is weak to ice, dragon and fairy. Its moves are dragon, ground, fire and nornmal. These are strong against dragon, poison, rock, steel, fire, electric, bug, grass and ice.

In [54]:
# Garchomp's stats

query = '''
SELECT *
FROM pokemon
WHERE name = 'Garchomp'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,445,Garchomp,108,130,95,80,85,102,Dragon,Ground,4.0,0.0


Total stats = 600

In [57]:
query = '''
SELECT ID, Name, (HP + Attack + Defense + SpAtk + SpDef + Speed) AS total_stats, Type1, Type2, generation, legendary
FROM pokemon
WHERE total_stats >= 600 
AND type1 NOT IN ('Dragon', 'Poison', 'Rock', 'Steel', 'Fire', 'Electric', 'Bug', 'Grass', 'Ice')
AND type2 NOT IN ('Dragon', 'Poison', 'Rock', 'Steel', 'Fire', 'Electric', 'Bug', 'Grass', 'Ice')
AND (type1 IN ('Ice', 'Dragon', 'Fairy') OR type2 IN ('Ice', 'Dragon', 'Fairy'))
AND legendary = 0
AND Name NOT IN ('Spiritomb', 'Roserade', 'Togekiss', 'Lucario', 'Milotic', 'Garchomp')
AND (Attack > 95 OR SpAtk > 85)
ORDER BY total_stats DESC; 
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
print(result)

Empty DataFrame
Columns: [ID, Name, total_stats, Type1, Type2, generation, legendary]
Index: []


Again there are no pokemon that satisfy the constraints. If we remove the total stat constraint we get the following:

In [59]:
query = '''
SELECT ID, Name, (HP + Attack + Defense + SpAtk + SpDef + Speed) AS total_stats, Type1, Type2, generation, legendary
FROM pokemon
WHERE type1 NOT IN ('Dragon', 'Poison', 'Rock', 'Steel', 'Fire', 'Electric', 'Bug', 'Grass', 'Ice')
AND type2 NOT IN ('Dragon', 'Poison', 'Rock', 'Steel', 'Fire', 'Electric', 'Bug', 'Grass', 'Ice')
AND (type1 IN ('Ice', 'Dragon', 'Fairy') OR type2 IN ('Ice', 'Dragon', 'Fairy'))
AND legendary = 0
AND Name NOT IN ('Spiritomb', 'Roserade', 'Togekiss', 'Lucario', 'Milotic', 'Garchomp')
AND (Attack > 95 OR SpAtk > 85)
ORDER BY total_stats DESC; 
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,total_stats,Type1,Type2,generation,legendary
0,1006,Iron Valiant,590,Fairy,Fighting,9.0,0.0
1,905,Enamorus Incarnate Forme,580,Fairy,Flying,8.0,0.0
2,987,Flutter Mane,570,Ghost,Fairy,9.0,0.0
3,730,Primarina,530,Water,Fairy,7.0,0.0
4,282,Gardevoir,518,Psychic,Fairy,3.0,0.0


Iron Valiant has the highest stat count of 590 which is only 10 points below Garchomp.

In [60]:
# Iron Valiant's data

query = '''
SELECT *
FROM pokemon
WHERE name = 'Iron Valiant'
'''

cnn.execute(query)
cnn.commit()

result = pd.read_sql_query(query, con=cnn)
result.head()

Unnamed: 0,ID,Name,HP,Attack,Defense,SpAtk,SpDef,Speed,Type1,Type2,generation,legendary
0,1006,Iron Valiant,74,130,90,120,60,116,Fairy,Fighting,9.0,0.0


## Final Team

<div style="text-align: center;">
    <img src="Images/final_team.png" alt="Final Team" width="1200">
</div>

### Data Visualisation via Power BI

In [67]:
# Dashboard

from IPython.display import IFrame
IFrame(src="https://app.powerbi.com/reportEmbed?reportId=3bec943c-c6ee-4d34-afe1-bec9c11c5122&autoAuth=true&ctid=2efd699a-1922-4e69-b601-108008d28a2e",
       height = 1200, width = 1500)

<div style="text-align: center;">
    <img src="Images/1.png" alt="1" width="1200">
</div>

<div style="text-align: center;">
    <img src="Images/2.png" alt="2" width="1200">
</div>

<div style="text-align: center;">
    <img src="Images/3.png" alt="3" width="1200">
</div>

<div style="text-align: center;">
    <img src="Images/4.png" alt="4" width="1200">
</div>

<div style="text-align: center;">
    <img src="Images/5.png" alt="5" width="1200">
</div>

<div style="text-align: center;">
    <img src="Images/6.png" alt="6" width="1200">
</div>

## Conclusion

The data shows that on average the total stats of pokemon have increased over time. The scatter graph shows how stats have changed with generation. You can see a general increase from generation 1 to generation 9.

The selected team has a high chance of defeating Cynthia's team. The comparison of stats is shown on the radar charts as well as the type matchup data to show what types are most effective against them (shown via bar chart). All counter pokemon are supper effective against their opponent pokemon and most of them have a higher stat total. Also each of the selected pokemon have a higher attack or special attack when compared to Cynthia's pokemon's defense or special defense. Looking at what generations the selected pokemon are from we can see the following:

- Primarina is from generation 7
- Metagross is from generation 3
- Raging Bolt is from generation 9
- Kommo-o is from generation 7
- Magnezone is from generation 4
- Iron Valiant is from generation 9

Most of them are from the later generations with the exception of Metagross and Magnezone who are from generation 3 and 4 respectively. Magnezone didn't satisfy the condition of having higher or equal total stats and was only selected due to being the next best choice (due to typing constraints). Metagross on the other hand is a psuedo legendary pokemon which is a non legendary pokemon that is considered to be the strongest in its region. None of the selected pokemon are from generations 1 or 2 which shows in general that there is an increase in sum of stats over time.

There are a couple other things that could determine a pokemon's strength that we haven't taken into account. For example:

- Move pool
- Abilities

However these aren't as important because moves can be taught via TMs and specific abilities may help. An example could be the ability Levitate. This allows the pokemon with this ability to gain immunity to ground type moves. This however doesn't really matter due to Cynthia's team having a wide type coverage.

The data therefore supports professor Oak's hypothesis and we can conclude that pokemon have indeed gotten stronger over time!!

<img src="Images/oak.png" alt="Oak" width="100"/>