# Challenge 1

In this challenge you will be working on **Pokemon**. You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.

![Pokemon](../images/pokemon.jpg)

Follow the instructions below and enter your code.

#### Import all required libraries.

In [1]:
import pandas as pd
import numpy as np

#### Import data set.

Read the dataset `pokemon.csv` into a dataframe called `pokemon`.

*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*

In [2]:
pokemon = pd.read_csv ('pokemon.csv')

#### Print first 10 rows of `pokemon`.

In [3]:
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.

For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions below. Read the descriptions and understand what each column means. This knowledge is helpful in your work with the data.

| Column | Description |
| --- | --- |
| # | ID for each pokemon |
| Name | Name of each pokemon |
| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |
| Type 2 | Some pokemon are dual type and have 2 |
| Total | A general guide to how strong a pokemon is |
| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| SP Def | The base damage resistance against special attacks |
| Speed | Determines which pokemon attacks first each round |
| Generation | Number of generation |
| Legendary | True if Legendary Pokemon False if not |

#### Obtain the distinct values across `Type 1` and `Type 2`.

Exctract all the values in `Type 1` and `Type 2`. Then create an array containing the distinct values across both fields.

In [4]:
# Exctract all the values in Type 1 and Type 2.
pokemon[['Type 1', 'Type 2']]
# Get ubique values of each Type
unique_type1 = set (pokemon ['Type 1'].unique())
unique_type1
unique_type2 = set (pokemon['Type 2'].unique())
unique_type2
# Get common values
common_types = unique_type1.intersection(unique_type2)
# Coonverting into a nd array
Common_Types = [[common_types]]
Common_Types

[[{'Bug',
   'Dark',
   'Dragon',
   'Electric',
   'Fairy',
   'Fighting',
   'Fire',
   'Flying',
   'Ghost',
   'Grass',
   'Ground',
   'Ice',
   'Normal',
   'Poison',
   'Psychic',
   'Rock',
   'Steel',
   'Water'}]]

#### Cleanup `Name` that contain "Mega".

If you have checked out the pokemon names carefully enough, you should have found there are junk texts in the pokemon names which contain "Mega". We want to clean up the pokemon names. For instance, "VenusaurMega Venusaur" should be "Mega Venusaur", and "CharizardMega Charizard X" should be "Mega Charizard X".

In [5]:
# Checking for unique values in 'Name'
unique_names = pokemon['Name'].unique()
# Print the unique names
print(unique_names)

# Checking for unique values containing 'Mega' in their names
mega_names = pokemon[pokemon['Name'].str.contains('Mega')]
print(mega_names)

# Iterating
for index, name in pokemon['Name'].items():
    parts = name.split(' ', 1)
    if len(parts) > 1:
        new_name = "Mega " + parts[1]
        pokemon.at[index, 'Name'] = new_name
pokemon['Name']

['Bulbasaur' 'Ivysaur' 'Venusaur' 'VenusaurMega Venusaur' 'Charmander'
 'Charmeleon' 'Charizard' 'CharizardMega Charizard X'
 'CharizardMega Charizard Y' 'Squirtle' 'Wartortle' 'Blastoise'
 'BlastoiseMega Blastoise' 'Caterpie' 'Metapod' 'Butterfree' 'Weedle'
 'Kakuna' 'Beedrill' 'BeedrillMega Beedrill' 'Pidgey' 'Pidgeotto'
 'Pidgeot' 'PidgeotMega Pidgeot' 'Rattata' 'Raticate' 'Spearow' 'Fearow'
 'Ekans' 'Arbok' 'Pikachu' 'Raichu' 'Sandshrew' 'Sandslash' 'Nidoran♀'
 'Nidorina' 'Nidoqueen' 'Nidoran♂' 'Nidorino' 'Nidoking' 'Clefairy'
 'Clefable' 'Vulpix' 'Ninetales' 'Jigglypuff' 'Wigglytuff' 'Zubat'
 'Golbat' 'Oddish' 'Gloom' 'Vileplume' 'Paras' 'Parasect' 'Venonat'
 'Venomoth' 'Diglett' 'Dugtrio' 'Meowth' 'Persian' 'Psyduck' 'Golduck'
 'Mankey' 'Primeape' 'Growlithe' 'Arcanine' 'Poliwag' 'Poliwhirl'
 'Poliwrath' 'Abra' 'Kadabra' 'Alakazam' 'AlakazamMega Alakazam' 'Machop'
 'Machoke' 'Machamp' 'Bellsprout' 'Weepinbell' 'Victreebel' 'Tentacool'
 'Tentacruel' 'Geodude' 'Graveler' 'Golem' 'P

0          Bulbasaur
1            Ivysaur
2           Venusaur
3      Mega Venusaur
4         Charmander
           ...      
795          Diancie
796     Mega Diancie
797    Mega Confined
798     Mega Unbound
799        Volcanion
Name: Name, Length: 800, dtype: object

#### Create a new column called `A/D Ratio` whose value equals to `Attack` divided by `Defense`.

For instance, if a pokemon has the Attack score 49 and Defense score 49, the corresponding `A/D Ratio` is 49/49=1.

In [6]:
# Create a new column 'A/D Ratio' by dividing 'Attack' by 'Defense'
pokemon['A/D Ratio'] = pokemon['Attack'] / pokemon['Defense']

# Display the updated DataFrame with the 'A/D Ratio' column
print(pokemon[['Name', 'Attack', 'Defense', 'A/D Ratio']])

              Name  Attack  Defense  A/D Ratio
0        Bulbasaur      49       49   1.000000
1          Ivysaur      62       63   0.984127
2         Venusaur      82       83   0.987952
3    Mega Venusaur     100      123   0.813008
4       Charmander      52       43   1.209302
..             ...     ...      ...        ...
795        Diancie     100      150   0.666667
796   Mega Diancie     160      110   1.454545
797  Mega Confined     110       60   1.833333
798   Mega Unbound     160       60   2.666667
799      Volcanion     110      120   0.916667

[800 rows x 4 columns]


#### Identify the pokemon with the highest `A/D Ratio`.

In [7]:
# Finding the highest 'A/D Ratio'
pokemon_high_ratio = pokemon['A/D Ratio'].max()

# Identify the Pokémon with the highest A/D Ratio
highest_ratio_pokemon = pokemon[pokemon['A/D Ratio'] == pokemon_high_ratio]

#Create a DataFrame to display the result
highest_ratio_pokemon_1 = pd.DataFrame(highest_ratio_pokemon)

highest_ratio_pokemon_1

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
429,386,Mega Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0


#### Identify the pokemon with the lowest A/D Ratio.

In [8]:
# your code he# Finding the lowest 'A/D Ratio'
pokemon_low_ratio = pokemon['A/D Ratio'].min()

# Identify the Pokémon with the lowest A/D Ratio
lowest_ratio_pokemon = pokemon[pokemon['A/D Ratio'] == pokemon_low_ratio]

#Create a DataFrame to display the result
lowest_ratio_pokemon_1 = pd.DataFrame(lowest_ratio_pokemon)
lowest_ratio_pokemon_1

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
230,213,Shuckle,Bug,Rock,505,20,10,230,10,230,5,2,False,0.043478


#### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.

Rules:

* If both `Type 1` and `Type 2` have valid values, the `Combo Type` value should contain both values in the form of `<Type 1> <Type 2>`. For example, if `Type 1` value is `Grass` and `Type 2` value is `Poison`, `Combo Type` will be `Grass-Poison`.

* If `Type 1` has valid value but `Type 2` is not, `Combo Type` will be the same as `Type 1`. For example, if `Type 1` is `Fire` whereas `Type 2` is `NaN`, `Combo Type` will be `Fire`.

In [9]:
# Creating a new column 'Combo Type' based on 'Type 1' and 'Type 2'
#Defyning my function
def create_combo_type(row):
    type1 = row['Type 1']
    type2 = row['Type 2']
    
    if pd.notna(type1) and pd.notna(type2):
        return type1 + '-' + type2
    elif pd.notna(type1):
        return type1
    elif pd.notna(type2):
        return type2
    else:
        return None  # If both 'Type 1' and 'Type 2' are missing

pokemon['Combo Type'] = pokemon.apply(create_combo_type, axis=1)

# Check for results
pokemon[['Name', 'Type 1', 'Type 2', 'Combo Type']]

Unnamed: 0,Name,Type 1,Type 2,Combo Type
0,Bulbasaur,Grass,Poison,Grass-Poison
1,Ivysaur,Grass,Poison,Grass-Poison
2,Venusaur,Grass,Poison,Grass-Poison
3,Mega Venusaur,Grass,Poison,Grass-Poison
4,Charmander,Fire,,Fire
...,...,...,...,...
795,Diancie,Rock,Fairy,Rock-Fairy
796,Mega Diancie,Rock,Fairy,Rock-Fairy
797,Mega Confined,Psychic,Ghost,Psychic-Ghost
798,Mega Unbound,Psychic,Dark,Psychic-Dark


#### Identify the pokemon whose `A/D Ratio` are among the top 5.

In [10]:
# Finding the top 5 'A/D Ratio'
top_five_ratios = pokemon.nlargest(5, 'A/D Ratio') # I can also use '.nsmalest' for the bottom results (sort values)
top_five_ratios

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
429,386,Mega Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0,Psychic
347,318,Carvanha,Water,Dark,305,45,90,20,65,20,65,3,False,4.5,Water-Dark
19,15,Mega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False,3.75,Bug-Poison
453,408,Cranidos,Rock,,350,67,125,40,30,30,58,4,False,3.125,Rock
348,319,Sharpedo,Water,Dark,460,70,120,40,95,40,95,3,False,3.0,Water-Dark


#### For the 5 pokemon printed above, aggregate `Combo Type` and use a list to store the unique values.

Your end product is a list containing the distinct `Combo Type` values of the 5 pokemon with the highest `A/D Ratio`.

In [11]:
distinct_combo_types = []

# Using iterationsto get through the top 5 Pokémon with the highest A/D Ratios
for combo_type in top_five_ratios['Combo Type']:
    # Check if the Combo Type is not in the list, and then add it
    if combo_type not in distinct_combo_types:
        distinct_combo_types.append(combo_type)

# Print the list of distinct Combo Type values
print(distinct_combo_types) # there's only 4 despite of being 5 top rated, because 'Water-Dark' shows up twice

['Psychic', 'Water-Dark', 'Bug-Poison', 'Rock']


#### For each of the `Combo Type` values obtained from the previous question, calculate the mean scores of all numeric fields across all pokemon.

Your output should look like below:

![Aggregate](../images/aggregated-mean.png)

In [12]:
# As we cannot use categorical values for mean calculation, will use the the top 5 'A/D Ratio'
combo_type_means = top_five_ratios.groupby('Combo Type').mean()

# Print the mean scores for each Combo Type
combo_type_means

Unnamed: 0_level_0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
Combo Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Bug-Poison,15.0,495.0,65.0,150.0,40.0,15.0,80.0,145.0,1.0,0.0,3.75
Psychic,386.0,600.0,50.0,180.0,20.0,180.0,20.0,150.0,3.0,1.0,9.0
Rock,408.0,350.0,67.0,125.0,40.0,30.0,30.0,58.0,4.0,0.0,3.125
Water-Dark,318.5,382.5,57.5,105.0,30.0,80.0,30.0,80.0,3.0,0.0,3.75
