In [None]:
## Project: Querying and Filtering Pokemon data

This project will help you practice your pandas querying and filtering skills. Let's begin!

<center>
<img src="./mikel-DypO_XgAE4Y-unsplash.jpg" >
    <p align="center">
        Photo by <a href="https://unsplash.com/@mykelgran?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Mikel</a> on <a href="https://unsplash.com/s/photos/pokemon?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>.
    </p>
</center>  

### Task 0 - Setup

There isn't much to do here, we'll provide the required imports and the read the pokemon CSV we'll be working with.

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("pokemon.csv")

df.head()

df.info()

df.describe()

#### Distribution of Pokemon Types:

df['Type 1'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='tab20c', figsize=(10, 8))

#### Distribution of Pokemon Totals:

df['Total'].plot(kind='hist', figsize=(10, 8))

df['Total'].plot(kind='box', vert=False, figsize=(10, 5))

#### Distribution of Legendary Pokemons:

df['Legendary'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='Set3', figsize=(10, 8))

### Basic filtering

Let's start with a few simple activities regarding filtering.

##### 1. How many Pokemons exist with an `Attack` value greater than 150?

Doing a little bit of visual exploration, we can have a sense of the most "powerful" pokemons (defined by their "Attack" feature). A boxplot is a great way to visualize this:

sns.boxplot(data=df, x='Attack')

df.loc[df['Attack'] > 150]

##### 2. Select all pokemons with a Speed of `10` or less

sns.boxplot(data=df, x='Speed')

slow_pokemons_df = df.loc[df['Speed'] <= 10]
slow_pokemons_df

df.query('Speed <= 10')

##### 3. How many Pokemons have a `Sp. Def` value of 25 or less?

 df.loc[df['Sp. Def'] <= 25].shape

##### 4. Select all the Legendary pokemons

# Try your code here
legendary_df = df.loc[df['Legendary'] == True]
legendary_df

##### 5. Find the outlier

Find the pokemon that is clearly an outlier in terms of Attack / Defense:

ax = sns.scatterplot(data=df, x="Defense", y="Attack")
ax.annotate(
    "Who's this guy?", xy=(228, 10), xytext=(150, 10), color='red',
    arrowprops=dict(arrowstyle="->", color='red')
)

df.loc[df['Defense'] >= 200]

### Advanced selection

Now let's use boolean operators to create more advanced expressions

##### 6. How many Fire-Flying Pokemons are there?

df.loc[(df['Type 1']=='Fire') & (df['Type 2']== 'Flying')]

##### 7. How many 'Poison' pokemons are across both types?

df.loc[(df['Type 1']=='Poison') | (df['Type 2']== 'Poison')]

df.loc[(df['Type 1']=='Poison') | (df['Type 2']== 'Poison')].shape

##### 8. What pokemon of `Type 1` *Ice* has the strongest defense?

df.loc[df['Type 1']=='Ice'].sort_values(by='Defense', ascending= False)

##### 9. What's the most common type of Legendary Pokemons?

legendary_df.sort_values(by='Type 1').mode()

legendary_df['Type 1'].mode()

##### 10. What's the most powerful pokemon from the first 3 generations, of type water?

df.loc[(df['Type 1']=='Water') & (df['Generation'] <=3)].sort_values(by='Total', ascending= False)

##### 11. What's the most powerful Dragon from the last two generations?

df.loc[((df['Type 1']=='Dragon') | (df['Type 2']=='Dragon')) & (df['Generation'] >=5)].sort_values(by='Total', ascending= False)

##### 12. Select most powerful Fire-type pokemons

# Try your code here
powerful_fire_df =df.loc[(df['Type 1']=='Fire') & (df['Attack']>100)]
powerful_fire_df 


##### 13. Select all Water-type, Flying-type pokemons

# Try your code here
water_flying_df =df.loc[(df['Type 1']=='Water') & (df['Type 2']=='Flying')]
water_flying_df

##### 14. Select specific columns of Legendary pokemons of type Fire

# Try your code here
legendary_fire_df =legendary_df.loc[df['Type 1']== 'Fire']
legend_fire_pokemon= legendary_fire_df[['Name','Attack','Generation']]
legend_fire_pokemon

legendary_fire_df = df.loc[(df['Type 1']== 'Fire') & (df['Legendary']== True), ['Name','Attack','Generation'] ]
legendary_fire_df

##### 15. Select Slow and Fast pokemons

This is the distribution of speed of the pokemons. The red lines indicate those bottom 5% and top 5% pokemons by speed:

ax = df['Speed'].plot(kind='hist', figsize=(10, 5), bins=100)
ax.axvline(df['Speed'].quantile(.05), color='red')
ax.axvline(df['Speed'].quantile(.95), color='red')

# Try your code here
slow_fast_df = ...

##### 16. Find the Ultra Powerful Legendary Pokemon

fig, ax = plt.subplots(figsize=(14, 7))
sns.scatterplot(data=df, x="Defense", y="Attack", hue='Legendary', ax=ax)
ax.annotate(
    "Who's this guy?", xy=(140, 150), xytext=(160, 150), color='red',
    arrowprops=dict(arrowstyle="->", color='red')
)

# Try your code here

### The End!
https://www.youtube.com/watch?v=gtjxAH8uaP0

In [1]:
import numpy as np 