## Pandas Exercises

We provided you with a dataset from [kaggle](https://www.kaggle.com/datasets/abcsds/pokemon) about Pokemon as a .csv-file.

The first task is obviously to import pandas following common conventions and load the csv as a dataframe in a variable called `pokemon_df`.

In [None]:
import pandas as pd

pokemon_df = pd.read_csv("Pokemon.csv")

Next familiarize yourself with the columns and their datatypes.

What are the columns?

How many entries are there?

Which are the numerical columns?

Are there missing values? Is this because the dataset is incomplete or is column just not applicable?

In [2]:
print(pokemon_df.head())
pokemon_df.info()
print(len(pokemon_df))
len(pokemon_df.dropna(axis=0))

   #                   Name Type 1  Type 2  Total  HP  Attack  Defense  \
0  1              Bulbasaur  Grass  Poison    318  45      49       49   
1  2                Ivysaur  Grass  Poison    405  60      62       63   
2  3               Venusaur  Grass  Poison    525  80      82       83   
3  3  VenusaurMega Venusaur  Grass  Poison    625  80     100      123   
4  4             Charmander   Fire     NaN    309  39      52       43   

   Sp. Atk  Sp. Def  Speed  Generation  Legendary  
0       65       65     45           1      False  
1       80       80     60           1      False  
2      100      100     80           1      False  
3      122      120     80           1      False  
4       60       50     65           1      False  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        8

414

Save the pokemon names in a variable called pokemon_names.

In [3]:
pokemon_names = pokemon_df["Name"]

Which types exist?

How many pokemon of each type are there roughly?

In [4]:
print(pokemon_df["Type 1"].nunique())
print(pokemon_df["Type 2"].nunique())
print(pokemon_df["Type 1"].value_counts()+pokemon_df["Type 2"].value_counts())

18
18
Bug          72
Dark         51
Dragon       50
Electric     50
Fairy        40
Fighting     53
Fire         64
Flying      101
Ghost        46
Grass        95
Ground       67
Ice          38
Normal      102
Poison       62
Psychic      90
Rock         58
Steel        49
Water       126
dtype: int64


Imagine we wouldn't care about the numerical stats. Create a sub-dataframe called non_numerical_df, which only contrains string and boolean data types.

In [5]:
non_numerical_df = pokemon_df[["Name", "Type 1", "Type 2", "Legendary"]]

Ensure that the `Total` column, which contains the sum of all pokemon attributes is calculated correctly. Assign a new column called "My Total" to the dataframe. The python function all(boolean_list) is able to check, if all entries in a list are true.  

In [6]:
pokemon_df["My Total"] = pokemon_df.HP + pokemon_df.Attack + pokemon_df.Defense + pokemon_df.Speed + pokemon_df["Sp. Atk"]+  pokemon_df["Sp. Def"]
print(all(pokemon_df["My Total"] == pokemon_df["Total"]))

True


Another common task, that we didn't address yet, is sorting data. Try to find a pandas method, that sorts the pokemon names alphabetically.

In [7]:
print(pokemon_names.sort_values().head())

510                  Abomasnow
511    AbomasnowMega Abomasnow
68                        Abra
392                      Absol
393            AbsolMega Absol
Name: Name, dtype: object


The first generation is obviously the only true one. Limit your dataframe to only first generation pokemon and save it in the variable first_generation_df.

In [8]:
first_generation_df = pokemon_df[pokemon_df.Generation == 1]

How many legendary fire pokemon are there in total?

In [9]:
boolean_list = (pokemon_df.Legendary==True) & ((pokemon_df["Type 1"]=="Fire") | (pokemon_df["Type 2"]=="Fire"))
legendary_fire_df = pokemon_df[boolean_list]
print(len(legendary_fire_df))

8


How can we exclude Mega variants of the same pokemon from the main dataframe?

In [10]:
df_without_mega = pokemon_df[~pokemon_df["Name"].str.contains("Mega")]
print(df_without_mega["Name"].head())

0     Bulbasaur
1       Ivysaur
2      Venusaur
4    Charmander
5    Charmeleon
Name: Name, dtype: object


Bug types need a buff. Increase the HP of all bug pokemon to the maximal HP value in the dataset. Save the new values to a new column and print it.

On the other hand this might be too much of a buff. The new column should instead have a HP increase of +10.

In [11]:
max_hp = pokemon_df["HP"].max()
pokemon_df["Buffed HP"] = pokemon_df["HP"].where(~((pokemon_df["Type 1"] == "Bug") | (pokemon_df["Type 2"] == "Bug")), max_hp)
print(pokemon_df[["Name", "HP", "Buffed HP"]].head(15))

def row_transform(row):
    if ((row["Type 1"] == "Bug") | (row["Type 2"] == "Bug")):
        row["Buffed HP"] = row["HP"] + 10
    else:
        row["Buffed HP"] = row["HP"]
    return row

print(pokemon_df.apply(row_transform, axis=1)[["Name", "HP", "Buffed HP"]].head(15))


                         Name  HP  Buffed HP
0                   Bulbasaur  45         45
1                     Ivysaur  60         60
2                    Venusaur  80         80
3       VenusaurMega Venusaur  80         80
4                  Charmander  39         39
5                  Charmeleon  58         58
6                   Charizard  78         78
7   CharizardMega Charizard X  78         78
8   CharizardMega Charizard Y  78         78
9                    Squirtle  44         44
10                  Wartortle  59         59
11                  Blastoise  79         79
12    BlastoiseMega Blastoise  79         79
13                   Caterpie  45        255
14                    Metapod  50        255
                         Name  HP  Buffed HP
0                   Bulbasaur  45         45
1                     Ivysaur  60         60
2                    Venusaur  80         80
3       VenusaurMega Venusaur  80         80
4                  Charmander  39         39
5         

What are the highest Attack values in each generation?

What is the standard deviation of the HP in each generation?

In [12]:
aggregate = pokemon_df.groupby("Generation")
highest_attack = aggregate["Attack"].max()
std_HP = aggregate["HP"].std()

print(highest_attack)
print(std_HP)

print(pokemon_df[pokemon_df["Generation"]==1].Attack.max())
print(pokemon_df[pokemon_df["Generation"]==2].Attack.max())
print(pokemon_df[pokemon_df["Generation"]==3].Attack.max())
print(pokemon_df[pokemon_df["Generation"]==4].Attack.max())
print(pokemon_df[pokemon_df["Generation"]==5].Attack.max())
print(pokemon_df[pokemon_df["Generation"]==6].Attack.max())


Generation
1    190
2    185
3    180
4    170
5    170
6    160
Name: Attack, dtype: int64
Generation
1    28.153968
2    30.589359
3    24.059634
4    25.113604
5    22.407748
6    20.907822
Name: HP, dtype: float64
190
185
180
170
170
160
