<img style="float: right;" src="http://dedalvs.com/kamakawi/words/pikachu.png" width="150" height="150" />
# SQL with Pokemon - Chapter 1

Welcome to the world of Pokemon! Most people who come through here are looking to start their Pokemon journey by catching, training and discovering Pokemon. It is important for people to go out into the unknown and discover secrets about Pokemon that we have yet to unravel. 

However, there's another option you may take, one that is equally as important, if not more important. You could become a Pokemon Researcher and help construct our Pokemon Database! Trainers out in the field need to have a wealth of accurate information in order for them to make informed decisions on their journey. You will still get the chance interact with Pokemon on a daily basis, but your main role is to work with our new SQL technology to create data tables that will best inform our trainers. Your work here in the Research Lab will affect the lives of thousands, maybe even millions of people! 

Does that sound like a journey worth embarking on? When you are ready, press START!

## Before we begin...

Before we begin, we need to talk about some terminology. All of our Pokemon data is stored in what we call a Relational  Database Management System, or RDBMS for short. We use SQL in reading, writing and handling the data. SQL stands for Structured Query Language. Think of a RDBMS as a large room filled with file cabinets and SQL as a tool to help you perform tasks against the file cabinets. As you will learn, SQL is an extremely powerful tool.

NOTE 1: I am using this CSV from Kaggle for this Project: https://www.kaggle.com/abcsds/pokemon
I am also using SQL Server Express and SQL Server Management Studio as my database platform. Feel free to use whichever flavor of SQL you prefer.

NOTE 2: I made a couple of edits in the CSV file. I removed the first column (the numbers) and I changed "Sp. Atk" and "Sp. Def" to "Special Attack" and "Special Defense" respectively. Everything else I kept the same.

NOTE 3: Rather than snipping SQL Server images onto this Jupyter Notebook, I will use pyodbc to connect SQL Server and use pandas to show my results. This will keep things simpler in presenting the data output. Other than that, I will use only SQL to manipulate the data.

## Setting things up...

In [1]:
import numpy as np
import pandas as pd
import pyodbc

conn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
                      "Server=localhost;"
                      "Database=Pokemon;"
                      "Trusted_Connection=yes;")
cursor = conn.cursor()

<img style="float: right;" src="http://files.shandymedia.com/styles/page_full/s3/images/photos/hollyscoop/9-pidgey.png" width="200" height="200" />
## Your first SQL Queries!

In [15]:
# This Query SELECTs all of our data
df = pd.read_sql_query('''
SELECT * 
FROM pokemon
''', conn)

# I then use the head function from Pandas to grab only the first 5 rows to keep things neat
df.head(5)

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Special Attack,Special Defense,Speed,Generation,Legendary
0,Bulbasaur,Grass,Poison,318.0,45.0,49.0,49.0,65.0,65.0,45.0,1.0,False
1,Ivysaur,Grass,Poison,405.0,60.0,62.0,63.0,80.0,80.0,60.0,1.0,False
2,Venusaur,Grass,Poison,525.0,80.0,82.0,83.0,100.0,100.0,80.0,1.0,False
3,VenusaurMega Venusaur,Grass,Poison,625.0,80.0,100.0,123.0,122.0,120.0,80.0,1.0,False
4,Charmander,Fire,,309.0,39.0,52.0,43.0,60.0,50.0,65.0,1.0,False


So let's go through the structure of our first query. The SELECT statement is how we grab the columns we need. In SQL, the * (star) is a special character which means "All". The FROM statement tells SQL which table we want to get data from. 

If we want to say "SELECT * FROM pokemon" in plain english, we would say "I want to select all the columns from the table pokemon".

Did you understand all of that? We could go into detail in how SQL pulls the data and in which order, but that would be putting the cart before the Rapidash, now would it?

So let's say we want to SELECT only Pokemon Names, both of their types and what generation the Pokemon come from. That query would look like:

In [18]:
# This Query SELECTs Pokemon Name, both of their Types and what generation the Pokemon are from
df = pd.read_sql_query('''
SELECT
    [Name],
    [Type 1],
    [Type 2],
    [Generation]
FROM pokemon
''', conn)

df.head(5)

Unnamed: 0,Name,Type 1,Type 2,Generation
0,Bulbasaur,Grass,Poison,1.0
1,Ivysaur,Grass,Poison,1.0
2,Venusaur,Grass,Poison,1.0
3,VenusaurMega Venusaur,Grass,Poison,1.0
4,Charmander,Fire,,1.0


Awesome! A couple of things before we go on. For maximum readability since we are learning, I kept the column names in proper case and I have spaces in the columns. In the Real World, most people name their columns in all lower case and without spaces. Because we have spaces, we have to use brackets [] in order for SQL to understand we want a specific column. Also, we need to seperate columns using commas. 

Next, we are going to introduce one of the most important statements in SQL: the WHERE statement. The WHERE statement allows you to filter results based on the type of values you want SQL to search for.

For our next example, let's say we want to search for Type 1 Water types. Using the above SQL query as a blueprint:

In [22]:
# Apply WHERE statement to find Type 1 Water types
df = pd.read_sql_query('''
SELECT
    [Name],
    [Type 1],
    [Type 2],
    [Generation]
FROM pokemon
WHERE [Type 1] = 'Water'
''', conn)

df.head(5)

Unnamed: 0,Name,Type 1,Type 2,Generation
0,Squirtle,Water,,1.0
1,Wartortle,Water,,1.0
2,Blastoise,Water,,1.0
3,BlastoiseMega Blastoise,Water,,1.0
4,Psyduck,Water,,1.0


As we can see, Pokemon such as Bulbasaur and Charmander got filtered out, and we see Pokemon like Squirtle and Psyduck remain. How a simple WHERE statement works is we type what column we want to apply a condition to. We then select the type operator we want to use, which in this case, we use an equals sign. And lastly, we define our condition, which is 'Water'. 

We can use as many WHERE statements as we like, by using the AND operator. So an example of using multiple WHERE statements is asking the Pokemon database to find Pokemon that are both Water AND Ice Types:

In [30]:
# Now let's find Water AND Ice types
df = pd.read_sql_query('''
SELECT
    [Name],
    [Type 1],
    [Type 2],
    [Generation]
FROM pokemon
WHERE [Type 1] = 'Water'
AND [Type 2] = 'Ice'
''', conn)

df.head(5)

Unnamed: 0,Name,Type 1,Type 2,Generation
0,Dewgong,Water,Ice,1.0
1,Cloyster,Water,Ice,1.0
2,Lapras,Water,Ice,1.0


We can take this one step further by asking the database that we want Water and Ice Types that have a Defense base stat of over 100:

In [32]:
# Let's add one more WHERE Statement, Defense is over 100
df = pd.read_sql_query('''
SELECT
    [Name],
    [Type 1],
    [Type 2],
    [Generation]
FROM pokemon
WHERE [Type 1] = 'Water'
AND [Type 2] = 'Ice'
AND [Defense] > 100
''', conn)

df.head(5)

Unnamed: 0,Name,Type 1,Type 2,Generation
0,Cloyster,Water,Ice,1.0


With just 3 WHERE statements, we narrowed our search down to just Cloyster! You may have noticed that our WHERE statement applied a condition on a column that we are not SELECTing. We can apply conditions and logic on columns that we are not returning back to us.

<img style="float: right;" src="https://upload.wikimedia.org/wikipedia/en/a/a5/Pok%C3%A9mon_Charmander_art.png" width="150" height="150" />
## Situation 1 - No more Charmanders?!

Professor Oak back in Pallet Town had an unexpected problem! In the last couple of weeks, there has been an influx of young trainers coming to his lab and picking up Charmanders as their first Pokemon for their journey. This has led to a shortage of Charmanders! Professor Oak doesn't want to force trainers to picking Bulbasaur or Squirtle, but he doesn't have to time to get more Charmanders for the next week's trainers.

Instead, Professor Oak is going to have to compromise by selecting a Fire Type that would be suitable for a young trainer. What would be a good choice in the Kanto region for Oak to pick?

In [46]:
df = pd.read_sql_query('''
SELECT
    [Name],
    [Type 1],
    [Type 2],
    [HP],
    [Attack],
    [Defense],
    [Special Attack],
    [Special Defense],
    [Speed]
FROM pokemon
WHERE [Type 1] = 'Fire'
AND [Generation] = 1
AND [Name] NOT LIKE '%Mega%'
AND [Name] NOT IN ('Charmander', 'Charmeleon', 'Charizard')
AND Attack < 70
''', conn)

df.head(10)

Unnamed: 0,Name,Type 1,Type 2,HP,Attack,Defense,Special Attack,Special Defense,Speed
0,Vulpix,Fire,,38.0,41.0,40.0,50.0,65.0,65.0


After typing in several lines of SQL Code into the Pokemon database, he seems to have settled on Vulpix as an alternative to Charmander!

Lots of things going on here. Let's go through all of them. We decided to SELECT Name, both types and all of the base stats. Next, we want a WHERE statement for Fire Types and Generation 1 Pokemon. 

The hairy stuff comes next. We can create WHERE statements using NOT, which doesn't return certain conditions. The operator LIKE is then used in tandem with the wildcard % inside the single quotations marks. What % does is that it looks for characters similar to the word it is searching for. 

So for example if we said 'Slow%', we would search for any Pokemon we the word 'Slow' and anything that comes after it. Likewise, if we searched '%king', we would search for any Pokemon with the word 'king' in it's name, and anything before it.

So what (AND [Name] NOT LIKE '%Mega%') is saying is that we DON'T want any Pokemon with Mega in their name.

(AND [Name] NOT IN ('Charmander', 'Charmeleon', 'Charizard')) is saying we DON'T want to return Pokemon Charmander, Charmeleon or Charizard.

And lastly, we don't Professor Oak to be giving Pokemon too powerful, so he wanted a Pokemon with an Attack base stat of less than 70

## Finishing up the Basics!

So far we have covered 3 basic SQL Statements: SELECT, FROM and WHERE. Now it's time for us to tackle the last 3 basic SQL Statements: GROUP BY, HAVING and ORDER BY.

Let's start by GROUP BY. GROUP BY allows us to group values together by an aggregate statement. Some aggregate statements include COUNT, MAX, MIN, SUM, and AVG. This will allows us to create a GROUP set. I think it will be easier to explain with an example.

So let's say we want to COUNT the number of Type 1 Water types in each Generation:

In [48]:
df = pd.read_sql_query('''
SELECT
    [Generation],
    COUNT([Type 1]) AS 'Number of Water Types'
FROM pokemon
WHERE [Type 1] = 'Water'
GROUP BY [Generation]
''', conn)

df.head(10)

Unnamed: 0,Generation,Number of Water Types
0,1.0,31
1,2.0,18
2,3.0,27
3,4.0,13
4,5.0,18
5,6.0,5


Several things going on here. We want a WHERE statement to grab only our Water types. Our new SQL Statement GROUP BY is then use on the Generation column. Since we want to COUNT the number of Water Types, in the SELECT statement, we add a COUNT() and put [Type 1] inside the parenthesis.

What we have after the COUNT([Type 1]) is an alias for the column. The AS statement after our column allows us to rename our column to whatever we want to name it. What I have in single quotes is Number of Water Types.

Based on this GROUP BY, we see that Generation 1 has 31 Water Types, Generation 2 has 18 Water Types and so on.

Let's do another one. Let's find the AVG Special Attack among all Type 1s:

In [50]:
df = pd.read_sql_query('''
SELECT
    [Type 1],
    AVG([Special Attack]) AS 'Average Special Attack'
FROM pokemon
GROUP BY [Type 1]
''', conn)

df.head(20)

Unnamed: 0,Type 1,Average Special Attack
0,Bug,53.869565
1,Dark,74.645161
2,Dragon,96.84375
3,Electric,90.022727
4,Fairy,78.529412
5,Fighting,53.111111
6,Fire,88.980769
7,Flying,94.25
8,Ghost,79.34375
9,Grass,77.5


Let's build on top of that Query. We can further filter down this query using a HAVING statement. Think of a HAVING statement as filtering data AFTER a GROUP BY has been applied. 

So let's say we want to grab only the Pokemon Types that have an Average Special Attack of above 70:

In [53]:
df = pd.read_sql_query('''
SELECT
    [Type 1],
    AVG([Special Attack]) AS 'Average Special Attack'
FROM pokemon
GROUP BY [Type 1]
HAVING AVG([Special Attack]) > 70
''', conn)

df.head(20)

Unnamed: 0,Type 1,Average Special Attack
0,Dark,74.645161
1,Dragon,96.84375
2,Electric,90.022727
3,Fairy,78.529412
4,Fire,88.980769
5,Flying,94.25
6,Ghost,79.34375
7,Grass,77.5
8,Ice,77.541667
9,Psychic,98.403509


Great! Now, let's sort the Average Special Attack in order from greatest to least:

In [55]:
df = pd.read_sql_query('''
SELECT
    [Type 1],
    AVG([Special Attack]) AS 'Average Special Attack'
FROM pokemon
GROUP BY [Type 1]
HAVING AVG([Special Attack]) > 70
ORDER BY 'Average Special Attack' DESC
''', conn)

df.head(20)

Unnamed: 0,Type 1,Average Special Attack
0,Psychic,98.403509
1,Dragon,96.84375
2,Flying,94.25
3,Electric,90.022727
4,Fire,88.980769
5,Ghost,79.34375
6,Fairy,78.529412
7,Ice,77.541667
8,Grass,77.5
9,Water,74.8125


Perfect! How an ORDER BY works is that we can sort a column (in this case, we can sort an alias, more on that later), in order. The default is to sort in order by A to Z for letters and least to greatest for numerical values. If you use a DESC after the column you want to sort, you will get greatest to least.

<img style="float: right;" src="https://cdn.bulbagarden.net/upload/f/fa/164Noctowl.png" width="200" height="200" />
## Situation 2 - Pewter City under attack!

Team Rocket is attacking Pewter City! Brock is out of town and you happen to be the only veteran trainer ready to stop them from stealing the fossils from the Pewter Museum! The Pokemon they are using are Koffings, Ekans, and Zubats. Does your team have the strength to defeat Team Rocket!?

Your Pokemon Party at the moment is Quilava, Vulpix and Growlithe. You cannot use the Pokemon center to retreive your other Pokemon. What you have is what you have to fight with.

We need to see if your Pokemon's AVG strength is strong enough to adequately defeat Team Rocket!

In [68]:
# Your Party AVG Strength
df = pd.read_sql_query('''
SELECT
    [Type 1],
    AVG([Total]) AS 'Average Total Base Stats'
FROM pokemon
WHERE [Type 1] = 'Fire'
AND NAME IN ('Quilava', 'Vulpix', 'Growlithe')
GROUP BY [Type 1]
''', conn)

df.head(20)

Unnamed: 0,Type 1,Average Total Base Stats
0,Fire,351.333333


In [69]:
# Team Rocket's AVG Strength
df = pd.read_sql_query('''
SELECT
    [Type 1],
    AVG([Total]) AS 'Average Total Base Stats'
FROM pokemon
WHERE [Type 1] = 'Poison'
AND NAME IN ('Koffing', 'Ekans', 'Zubat')
GROUP BY [Type 1]
''', conn)

df.head(20)

Unnamed: 0,Type 1,Average Total Base Stats
0,Poison,291.0


It seems like your party is a quite a bit stronger than Team Rocket. We should be able to take them down easy!

<img style="float: right;" src="https://cdn.bulbagarden.net/upload/thumb/9/92/FireRed_LeafGreen_Blaine.png/436px-FireRed_LeafGreen_Blaine.png" width="200" height="200" />
# Situation 3 - So many choices!?


You are about to challenge Blaine to a Gym Battle! After being defeated by Red and Gary Oak, Blaine has been training extra hard to make sure he doesn't lose so easily to younger trainers. Some of the Pokemon on his roster include Ninetails, Camerupt, Ampharos, Jumpluff and Arcanine. 

The mix of Electric and Grass in his roster will make it difficult to just fight him with Water or Grounds types. What will you do?

With such an evolved team, you need to have a powerful roster to begin with. Next to pick some Pokemon that can contend with all his types. Let's see what we can do...

In [6]:
# Team Rocket's AVG Strength
df = pd.read_sql_query('''
SELECT 
    [Name],
    [Type 1],
    [Type 2],
    [Total],
    [HP],
    [Attack],
    [Defense],
    [Special Attack],
    [Special Defense],
    [Speed]
  FROM [Pokemon].[dbo].[pokemon]
  WHERE [Special Attack] > 100
  AND Legendary = 0
  AND [Name] NOT LIKE '%Mega%'
  AND [Type 1] IN ('Water', 'Ground')
  AND [Speed] > 60
  OR [Attack] > 100
  AND Legendary = 0
  AND [Name] NOT LIKE '%Mega%'
  AND [Type 1] IN ('Water', 'Ground')
  AND [Speed] > 60
  ORDER BY [Special Attack] DESC
''', conn)

df.head(20)

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Special Attack,Special Defense,Speed
0,KeldeoOrdinary Forme,Water,Fighting,580.0,91.0,72.0,90.0,129.0,90.0,108.0
1,KeldeoResolute Forme,Water,Fighting,580.0,91.0,72.0,90.0,129.0,90.0,108.0
2,Vaporeon,Water,,525.0,130.0,65.0,60.0,110.0,95.0,65.0
3,Samurott,Water,,528.0,95.0,100.0,85.0,108.0,70.0,70.0
4,Greninja,Water,Dark,530.0,72.0,95.0,67.0,103.0,71.0,122.0
5,Sharpedo,Water,Dark,460.0,70.0,120.0,40.0,95.0,40.0,95.0
6,Floatzel,Water,,495.0,85.0,105.0,55.0,85.0,50.0,115.0
7,Feraligatr,Water,,530.0,85.0,105.0,100.0,79.0,83.0,78.0
8,Krookodile,Ground,Dark,519.0,95.0,117.0,80.0,65.0,70.0,92.0
9,Gyarados,Water,Flying,540.0,95.0,125.0,79.0,60.0,100.0,81.0


Lots of stuff going on here. The two big WHERE statements are we want Pokemon that have Special Attack OR Attack base stats of above 100. Then we add in that we don't want Megas, Legendaries, the Type 1 has to be Water or Ground and their speed has to above 60. Overall, we want relatively quick Pokemon that are really powerful.

Looking at the list, we have some good choices here. 

Excadrill at the bottom has 135 Attack, which is the strongest Pokemon here. 
We also have Vaporeon, who has high HP and decent Special Attack.
Gyarados is pretty strong too, but needs to avoid Blaine's Ampharos.
Greninja is the fastest Pokemon in the list, and still has 103 Special Attack.

Looking at what we have, I'd say Excadrill, Vaporeon, Gyarados and Greninja would be fantastic choices against Blaine's party. Let's challenge him!

# Wrapping things up

In this first section of Learning SQL with Pokemon, we learned the 6 basic SQL Statements to query a RDBMS:

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY