# Challenge 1

In this challenge you will be working on pokemons... You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.

![Pokemon](pokemon.jpg)

Follow the instructions below and enter your code.

### Import all required libraries

In [1]:
# import libraries
import pandas as pd
import re

### Import data set

Import data set `Pokemon.csv` from the `your-code` directory of this lab. Read the data into a dataframe called `pokemon`.

*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*

In [2]:
# import data set
pokemon = pd.read_csv('Pokemon.csv')

### Print first 10 rows of `pokemon`

In [3]:
# enter your code here
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.

For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions as follows:

| Column | Description |
| --- | --- |
| # | ID for each pokemon |
| Name | Name of each pokemon |
| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |
| Type 2 | Some pokemon are dual type and have 2 |
| Total | Sum of all stats that come after this, a general guide to how strong a pokemon is |
| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| SP Def | The base damage resistance against special attacks |
| Speed | Determines which pokemon attacks first each round |
| Generation | Number of generation |
| Legendary | True if Legendary Pokemon False if not |

### Print the distinct values in `Type 1` and `Type 2` combined

In [4]:
# enter your code here
combined = pokemon.groupby(['Type 1', 'Type 2']).size().reset_index().rename(columns={0:'count'})
combined.head(15)

Unnamed: 0,Type 1,Type 2,count
0,Bug,Electric,2
1,Bug,Fighting,2
2,Bug,Fire,2
3,Bug,Flying,14
4,Bug,Ghost,1
5,Bug,Grass,6
6,Bug,Ground,2
7,Bug,Poison,12
8,Bug,Rock,3
9,Bug,Steel,7


Check out the pokemon names in the first 10 rows. You find there are junk texts in the pokemon names which contain "Mega". For instance, "VenusaurMega Venusaur" should be "Mega Venusaur", and "CharizardMega Charizard X" should be "Mega Charizard X".

### Cleanup `Name` that contain "Mega"

In [5]:
# enter your code here
def rename_mega(s): 
    res = s
    if 'Mega' in res: 
        pattern = re.compile('Mega')
        res = s[re.search(pattern, s).span()[0]:]
    return res

pokemon['Name'] = pokemon['Name'].apply(lambda x: rename_mega(x))

pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`

For instance, pokemon #1 has the Attack score 49 and Defense score 49. The corresponding `A/D Ratio` is 49/49=1.

In [6]:
# enter your code here
pokemon['A/D Ratio'] = pokemon['Attack'] / pokemon['Defense']

# test transformed data
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462


### Print the pokemon with the highest `A/D Ratio`

In [7]:
# enter your code here
pokemon[pokemon['A/D Ratio'] == pokemon['A/D Ratio'].max()]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0


### Print the pokemon with the lowest A/D Ratio

In [8]:
# enter your code here
pokemon[pokemon['A/D Ratio'] == pokemon['A/D Ratio'].min()]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
230,213,Shuckle,Bug,Rock,505,20,10,230,10,230,5,2,False,0.043478


### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.

Conditions:

* If `Type 2` value is a valid string, the `Combo Type` value should be `<Type 1>-<Type 2>` (e.g. `Grass-Poison`).

* If `Type 2` value is `NaN`, the `Combo Type` value should be the same as `Type 1` which always exists.

*Hint: Consider using function and `apply`.*

In [9]:
# enter your code here
def combine(type1, type2): 
    res = type1
    if type(type2) == str:  
        res += '-' + type2
    return res

pokemon['Combo Type'] = pokemon.apply(lambda x: combine(x['Type 1'], x['Type 2']), axis=1)

# test transformed data
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,1.0,Grass-Poison
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,0.984127,Grass-Poison
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,0.987952,Grass-Poison
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,0.813008,Grass-Poison
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,1.209302,Fire
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,1.103448,Fire
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,1.076923,Fire-Flying
7,6,Mega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,1.171171,Fire-Dragon
8,6,Mega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False,1.333333,Fire-Flying
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False,0.738462,Water


### Print `Combo Type` for pokemons whose `A/D Ratio` are among the top 5 

In [10]:
# enter your code here

result = pokemon.sort_values('A/D Ratio', ascending=False).head(5)
result['Combo Type']

429       Psychic
347    Water-Dark
19     Bug-Poison
453          Rock
348    Water-Dark
Name: Combo Type, dtype: object

In [11]:
# df['A'].isin([3, 6])
pokemon_filtrado = pokemon[pokemon['Combo Type'].isin(result['Combo Type'])]
pokemon_filtrado.head(15)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
16,13,Weedle,Bug,Poison,195,40,35,30,20,20,50,1,False,1.166667,Bug-Poison
17,14,Kakuna,Bug,Poison,205,45,25,50,25,25,35,1,False,0.5,Bug-Poison
18,15,Beedrill,Bug,Poison,395,65,90,40,45,80,75,1,False,2.25,Bug-Poison
19,15,Mega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False,3.75,Bug-Poison
53,48,Venonat,Bug,Poison,305,60,55,50,40,55,45,1,False,1.1,Bug-Poison
54,49,Venomoth,Bug,Poison,450,70,65,60,90,75,90,1,False,1.083333,Bug-Poison
68,63,Abra,Psychic,,310,25,20,15,105,55,90,1,False,1.333333,Psychic
69,64,Kadabra,Psychic,,400,40,35,30,120,70,105,1,False,1.166667,Psychic
70,65,Alakazam,Psychic,,500,55,50,45,135,95,120,1,False,1.111111,Psychic
71,65,Mega Alakazam,Psychic,,590,55,50,65,175,95,150,1,False,0.769231,Psychic


### For the 5 `Combo Type` values printed from the previous question, calculate the aggregated `Attack` scores for each `Combo Type`.

In [12]:
# enter your code here
pokemon_filtrado.groupby('Combo Type').agg({'Attack': 'sum'})

Unnamed: 0_level_0,Attack
Combo Type,Unnamed: 1_level_1
Bug-Poison,820
Psychic,2468
Rock,930
Water-Dark,720


In [13]:
pokemon[pokemon['Combo Type'] == 'Water-Dark']

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
141,130,Mega Gyarados,Water,Dark,640,95,155,109,70,130,81,1,False,1.422018,Water-Dark
347,318,Carvanha,Water,Dark,305,45,90,20,65,20,65,3,False,4.5,Water-Dark
348,319,Sharpedo,Water,Dark,460,70,120,40,95,40,95,3,False,3.0,Water-Dark
349,319,Mega Sharpedo,Water,Dark,560,70,140,70,110,65,105,3,False,2.0,Water-Dark
374,342,Crawdaunt,Water,Dark,468,63,120,85,90,55,55,3,False,1.411765,Water-Dark
726,658,Greninja,Water,Dark,530,72,95,67,103,71,122,6,False,1.41791,Water-Dark


### `Total` formula hypothesis testing

From the data descriptions you may have noticed there is a column called `Total` which indicates how strong the pokemon is. Make a hypothesis how `Total` is calculated and test your hypothesis. 

The general guideline is first examine the data carefully and make a guess how `Total` might have been calculated. You can write a math formula and convert it to a function. Then calculate the results based on your formula and store the results in a new column called `Guessed Total`. Next compare whether `Guessed Total` and `Total` contain the same values. If values match, congratuations you have verified your hypothesis! Otherwise, revise your formula, update the values in `Guessed Total`, and compare again.

In [14]:
# enter your code here
# Suposición: Total = HP + Attack + Defense + Sp. Atk + Sp. Def + Speed
pokemon['Guessed Total'] = pokemon['HP'] + pokemon['Attack'] + pokemon['Defense'] + pokemon['Sp. Atk'] + pokemon['Sp. Def'] + pokemon['Speed']
all(pokemon['Guessed Total'] == pokemon['Total'])

True