# Challenge 1

In this challenge you will be working on **Pokemon**. You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.

![Pokemon](../images/pokemon.jpg)

Follow the instructions below and enter your code.

#### Import all required libraries.

In [4]:
# import libraries
import pandas as pd
import numpy as np

#### Import data set.

Read the dataset `pokemon.csv` into a dataframe called `pokemon`.

*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*

In [5]:
# import dataset
pokemon = pd.read_csv('Pokemon.csv')

#### Print first 10 rows of `pokemon`.

In [6]:
# your code here
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.

> Add blockquote



For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions below. Read the descriptions and understand what each column means. This knowledge is helpful in your work with the data.

| Column | Description |
| --- | --- |
| # | ID for each pokemon |
| Name | Name of each pokemon |
| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |
| Type 2 | Some pokemon are dual type and have 2 |
| Total | A general guide to how strong a pokemon is |
| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| SP Def | The base damage resistance against special attacks |
| Speed | Determines which pokemon attacks first each round |
| Generation | Number of generation |
| Legendary | True if Legendary Pokemon False if not |

#### Obtain the distinct values across `Type 1` and `Type 2`.

Exctract all the values in `Type 1` and `Type 2`. Then create an array containing the distinct values across both fields.

In [16]:
# your code here
# Extract the values from 'Type 1' and 'Type 2'
type1_values = pokemon['Type 1'].dropna().unique()  # Dropping NaN values
type2_values = pokemon['Type 2'].dropna().unique()  # Dropping NaN values

# Combine both arrays
combined_types = pd.Series(list(type1_values) + list(type2_values))

# Find unique values across both columns
distinct_values = combined_types.unique()

print(distinct_values)


['Grass' 'Fire' 'Water' 'Bug' 'Normal' 'Poison' 'Electric' 'Ground'
 'Fairy' 'Fighting' 'Psychic' 'Rock' 'Ghost' 'Ice' 'Dragon' 'Dark' 'Steel'
 'Flying']


#### Cleanup `Name` that contain "Mega".

If you have checked out the pokemon names carefully enough, you should have found there are junk texts in the pokemon names which contain "Mega". We want to clean up the pokemon names. For instance, "VenusaurMega Venusaur" should be "Mega Venusaur", and "CharizardMega Charizard X" should be "Mega Charizard X".

In [18]:
# your code here
pokemon['Name'] = pokemon['Name'].str.replace('Mega', '', regex=False).str.strip()


# test transformed data
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,Venusaur Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,Charizard Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,Charizard Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


#### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`.

For instance, if a pokemon has the Attack score 49 and Defense score 49, the corresponding `A/D Ratio` is 49/49=1.

In [20]:
# your code here
pokemon['A/D Ratio'] = pokemon['Attack'] / pokemon['Defense']
pokemon['A/D Ratio'].replace([float('inf'), -float('inf')], pd.NA, inplace=True)

pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952
3,3,Venusaur Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302


#### Identify the pokemon with the highest `A/D Ratio`.

In [22]:
# your code here
max_ad_ratio = pokemon['A/D Ratio'].max()
max_ad_pokemon = pokemon[pokemon['A/D Ratio'] == max_ad_ratio]  # Get the row with the maximum value
max_ad_pokemon

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0


#### Identify the pokemon with the lowest A/D Ratio.

In [23]:
# your code here
min_ad_ratio = pokemon['A/D Ratio'].min()
min_ad_pokemon = pokemon[pokemon['A/D Ratio'] == min_ad_ratio]
min_ad_pokemon

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
230,213,Shuckle,Bug,Rock,505,20,10,230,10,230,5,2,False,0.043478


#### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.

Rules:

* If both `Type 1` and `Type 2` have valid values, the `Combo Type` value should contain both values in the form of `<Type 1> <Type 2>`. For example, if `Type 1` value is `Grass` and `Type 2` value is `Poison`, `Combo Type` will be `Grass-Poison`.

* If `Type 1` has valid value but `Type 2` is not, `Combo Type` will be the same as `Type 1`. For example, if `Type 1` is `Fire` whereas `Type 2` is `NaN`, `Combo Type` will be `Fire`.

In [25]:
# Define a function to combine Type 1 and Type 2
def create_combo_type(row):
    if pd.notna(row['Type 1']) and pd.notna(row['Type 2']):
        return f"{row['Type 1']}-{row['Type 2']}"
    elif pd.notna(row['Type 1']):
        return row['Type 1']
    else:
        return pd.NA

# Apply the function to create the Combo Type column
pokemon['Combo Type'] = pokemon.apply(create_combo_type, axis=1)

# Display the first few rows to check the new column
print(pokemon.head())


   #               Name Type 1  Type 2  Total  HP  Attack  Defense  Sp. Atk  \
0  1          Bulbasaur  Grass  Poison    318  45      49       49       65   
1  2            Ivysaur  Grass  Poison    405  60      62       63       80   
2  3           Venusaur  Grass  Poison    525  80      82       83      100   
3  3  Venusaur Venusaur  Grass  Poison    625  80     100      123      122   
4  4         Charmander   Fire     NaN    309  39      52       43       60   

   Sp. Def  Speed  Generation  Legendary  A/D Ratio    Combo Type  
0       65     45           1      False   1.000000  Grass-Poison  
1       80     60           1      False   0.984127  Grass-Poison  
2      100     80           1      False   0.987952  Grass-Poison  
3      120     80           1      False   0.813008  Grass-Poison  
4       50     65           1      False   1.209302          Fire  


#### Identify the pokemon whose `A/D Ratio` are among the top 5.

In [26]:
# your code here
# Sort the DataFrame by A/D Ratio in descending order
top_5_ad_ratio = pokemon.sort_values(by='A/D Ratio', ascending=False).head(5)

# Display the details of the top 5 Pokémon with the highest A/D Ratio
print(top_5_ad_ratio)


       #                Name   Type 1  Type 2  Total  HP  Attack  Defense  \
429  386  DeoxysAttack Forme  Psychic     NaN    600  50     180       20   
347  318            Carvanha    Water    Dark    305  45      90       20   
19    15   Beedrill Beedrill      Bug  Poison    495  65     150       40   
453  408            Cranidos     Rock     NaN    350  67     125       40   
348  319            Sharpedo    Water    Dark    460  70     120       40   

     Sp. Atk  Sp. Def  Speed  Generation  Legendary  A/D Ratio  Combo Type  
429      180       20    150           3       True      9.000     Psychic  
347       65       20     65           3      False      4.500  Water-Dark  
19        15       80    145           1      False      3.750  Bug-Poison  
453       30       30     58           4      False      3.125        Rock  
348       95       40     95           3      False      3.000  Water-Dark  


#### For the 5 pokemon printed above, aggregate `Combo Type` and use a list to store the unique values.

Your end product is a list containing the distinct `Combo Type` values of the 5 pokemon with the highest `A/D Ratio`.

In [27]:
# your code here
# Step 1: Sort the DataFrame by A/D Ratio in descending order and get the top 5
top_5_ad_ratio = pokemon.sort_values(by='A/D Ratio', ascending=False).head(5)

# Step 2: Extract the Combo Type values from these top 5 Pokémon
combo_types = top_5_ad_ratio['Combo Type']

# Step 3: Get the unique Combo Type values
unique_combo_types = combo_types.unique().tolist()

# Display the list of unique Combo Type values
print(unique_combo_types)


['Psychic', 'Water-Dark', 'Bug-Poison', 'Rock']


#### For each of the `Combo Type` values obtained from the previous question, calculate the mean scores of all numeric fields across all pokemon.

Your output should look like below:

![Aggregate](../images/aggregated-mean.png)

In [32]:
# your code here
# Define the unique Combo Type values obtained previously
combo_types = ['Psychic', 'Water-Dark', 'Bug-Poison', 'Rock']

# Initialize an empty dictionary to store the mean scores for each Combo Type
mean_scores_by_combo_type = {}

# Loop through each unique Combo Type
for combo_type in combo_types:
    # Filter the DataFrame by the current Combo Type
    filtered_pokemon = pokemon[pokemon['Combo Type'] == combo_type]

    # Calculate the mean of all numeric fields for this Combo Type
    mean_scores = filtered_pokemon.mean(numeric_only=True)

    # Store the mean scores in the dictionary with Combo Type as the key
    mean_scores_by_combo_type[combo_type] = mean_scores

# Convert the dictionary to a DataFrame for better readability
mean_scores_df = pd.DataFrame(mean_scores_by_combo_type).T

# Display the resulting DataFrame
print(mean_scores_df)

### #: This column represents the number of Pokémon with that particular Combo Type.
### It's a count of how many entries were used to calculate the mean values


                     #       Total         HP      Attack     Defense  \
Psychic     381.973684  464.552632  72.552632   64.947368   67.236842   
Water-Dark  347.666667  493.833333  69.166667  120.000000   65.166667   
Bug-Poison  199.166667  347.916667  53.750000   68.333333   58.083333   
Rock        410.111111  409.444444  67.111111  103.333333  107.222222   

              Sp. Atk    Sp. Def      Speed  Generation  Legendary  A/D Ratio  
Psychic     98.552632  82.394737  78.868421    3.342105   0.236842   1.164196  
Water-Dark  88.833333  63.500000  87.166667    3.166667   0.000000   2.291949  
Bug-Poison  42.500000  59.333333  65.916667    2.333333   0.000000   1.315989  
Rock        40.555556  58.333333  32.888889    3.888889   0.111111   1.260091  
