# Challenge 1

In this challenge you will be working on **pokemons**. You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.

![Pokemon](pokemon.jpg)

Follow the instructions below and enter your code.

### Import all required libraries

In [46]:
import numpy as np
import pandas as pd

### Import data set

Import data set `Pokemon.csv` from the `your-code` directory of this lab. Read the data into a dataframe called `pokemon`.

*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*

In [47]:
pokemon = pd.read_csv('Pokemon.csv')

### Print first 10 rows of `pokemon`

In [48]:
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.

For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions below. Read the descriptions and understand what each column means. This knowledge is helpful in your work with the data.

| Column | Description |
| --- | --- |
| # | ID for each pokemon |
| Name | Name of each pokemon |
| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |
| Type 2 | Some pokemon are dual type and have 2 |
| Total | A general guide to how strong a pokemon is |
| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| SP Def | The base damage resistance against special attacks |
| Speed | Determines which pokemon attacks first each round |
| Generation | Number of generation |
| Legendary | True if Legendary Pokemon False if not |

### Obtain the distinct values across `Type 1` and `Type 2`

Exctract all the values in `Type 1` and `Type 2`. Then create an array containing the distinct values across both fields.

Creamos una función para este ejercicio que vaya a ser reutilizable.  
Con ella queremos sacar los distintos valores que toma una serie.  
####  Parámetros de entrada:  
.- df (dataframe)  
.- col (nombre de columna en string)  
#### Resultado:  
.- una lista con los distintos valores de la serie generada a partir del dataframe y columna pasados como parámetros

In [49]:
def get_values(df, col):
    x = df[col].value_counts()
    x_dict = x.to_dict()
    x_key_list = list(x_dict.keys())
    return x_key_list

In [50]:
type_1_keys_list = get_values(pokemon, 'Type 1')
type_1_keys_list

['Water',
 'Normal',
 'Grass',
 'Bug',
 'Psychic',
 'Fire',
 'Electric',
 'Rock',
 'Ghost',
 'Dragon',
 'Ground',
 'Dark',
 'Poison',
 'Fighting',
 'Steel',
 'Ice',
 'Fairy',
 'Flying']

In [51]:
type_2_keys_list = get_values(pokemon, 'Type 2')
type_2_keys_list

['Flying',
 'Ground',
 'Poison',
 'Psychic',
 'Fighting',
 'Grass',
 'Fairy',
 'Steel',
 'Dark',
 'Dragon',
 'Rock',
 'Water',
 'Ice',
 'Ghost',
 'Fire',
 'Electric',
 'Normal',
 'Bug']

In [52]:
result_after_func = list(set(type_1_keys_list + type_2_keys_list))
result_after_func

['Ground',
 'Steel',
 'Fairy',
 'Flying',
 'Rock',
 'Electric',
 'Ghost',
 'Fire',
 'Dark',
 'Poison',
 'Water',
 'Grass',
 'Dragon',
 'Ice',
 'Fighting',
 'Psychic',
 'Bug',
 'Normal']

In [53]:
type_1_keys =pokemon['Type 1'].unique()
type_1_keys

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying'], dtype=object)

In [54]:
type_2_keys =pokemon['Type 2'].unique()
type_2_keys

array(['Poison', nan, 'Flying', 'Dragon', 'Ground', 'Fairy', 'Grass',
       'Fighting', 'Psychic', 'Steel', 'Ice', 'Rock', 'Dark', 'Water',
       'Electric', 'Fire', 'Ghost', 'Bug', 'Normal'], dtype=object)

In [55]:
all_types = set(list(type_1_keys) + list(type_2_keys))
result = list(all_types)
len(result)



19

### Cleanup `Name` that contain "Mega"

If you have checked out the pokemon names carefully enough, you should have found there are junk texts in the pokemon names which contain "Mega". We want to clean up the pokemon names. For instance, "VenusaurMega Venusaur" should be "Mega Venusaur", and "CharizardMega Charizard X" should be "Mega Charizard X".

In [58]:
def clean_values(x):
    try:
        if 'Mega' in x:
            x = (x.split('Mega'))[1]
            x = 'Mega' + x 
        return x
    except:
        return x


In [134]:
pokemon['Name'] = pokemon['Name'].apply(clean_values)

### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`

For instance, if a pokemon has the Attack score 49 and Defense score 49, the corresponding `A/D Ratio` is 49/49=1.

In [60]:
pokemon['A/D Ratio'] = pokemon['Attack']/pokemon['Defense']
pokemon['A/D Ratio']

0      1.000000
1      0.984127
2      0.987952
3      0.813008
4      1.209302
5      1.103448
6      1.076923
7      1.171171
8      1.333333
9      0.738462
10     0.787500
11     0.830000
12     0.858333
13     0.857143
14     0.363636
15     0.900000
16     1.166667
17     0.500000
18     2.250000
19     3.750000
20     1.125000
21     1.090909
22     1.066667
23     1.000000
24     1.600000
25     1.350000
26     2.000000
27     1.384615
28     1.363636
29     1.231884
         ...   
770    1.000000
771    1.226667
772    1.017544
773    0.333333
774    1.428571
775    1.415094
776    1.428571
777    0.879121
778    1.458333
779    1.447368
780    0.942857
781    0.942857
782    0.942857
783    0.942857
784    0.737705
785    0.696721
786    0.778689
787    0.819672
788    0.811765
789    0.635870
790    0.857143
791    0.875000
792    1.378947
793    1.378947
794    0.826446
795    0.666667
796    1.454545
797    1.833333
798    2.666667
799    0.916667
Name: A/D Ratio, Length:

### Identify the pokemon with the highest `A/D Ratio`

In [73]:
pokemon_by_ratio = pokemon.sort_values(by=['A/D Ratio'], ascending= False)
pokemon_by_ratio.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0
347,318,Carvanha,Water,Dark,305,45,90,20,65,20,65,3,False,4.5
19,15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False,3.75
453,408,Cranidos,Rock,,350,67,125,40,30,30,58,4,False,3.125
348,319,Sharpedo,Water,Dark,460,70,120,40,95,40,95,3,False,3.0


In [68]:
highest_pokemon_at_ad_ratio = pokemon_by_ratio.iloc[0]['Name']
highest_pokemon_at_ad_ratio

'DeoxysAttack Forme'

### Identify the pokemon with the lowest A/D Ratio

In [75]:
lowest_pokemon_at_ad_ratio = pokemon_by_ratio.iloc[-1]
lowest_pokemon_at_ad_ratio

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0
347,318,Carvanha,Water,Dark,305,45,90,20,65,20,65,3,False,4.5
19,15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False,3.75
453,408,Cranidos,Rock,,350,67,125,40,30,30,58,4,False,3.125
348,319,Sharpedo,Water,Dark,460,70,120,40,95,40,95,3,False,3.0


### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.

Rules:

* If both `Type 1` and `Type 2` have valid values, the `Combo Type` value should contain both values in the form of `<Type 1> <Type 2>`. For example, if `Type 1` value is `Grass` and `Type 2` value is `Poison`, `Combo Type` will be `Grass-Poison`.

* If `Type 1` has valid value but `Type 2` is not, `Combo Type` will be the same as `Type 1`. For example, if `Type 1` is `Fire` whereas `Type 2` is `NaN`, `Combo Type` will be `Fire`.

In [80]:
def replace_nan_strings(df, col):
    return df[col].replace(np.nan, '', regex=True)

In [103]:
pokemon['Type 1'].value_counts()

Water       112
Normal       98
Grass        70
Bug          69
Psychic      57
Fire         52
Electric     44
Rock         44
Ghost        32
Dragon       32
Ground       32
Dark         31
Poison       28
Fighting     27
Steel        27
Ice          24
Fairy        17
Flying        4
Name: Type 1, dtype: int64

In [101]:
pokemon['Type 2'].value_counts()

            386
Flying       97
Ground       35
Poison       34
Psychic      33
Fighting     26
Grass        25
Fairy        23
Steel        22
Dark         20
Dragon       18
Ice          14
Rock         14
Water        14
Ghost        14
Fire         12
Electric      6
Normal        4
Bug           3
Name: Type 2, dtype: int64

In [98]:
pokemon['Type 1'] = replace_nan_strings(pokemon, 'Type 1')

In [85]:
pokemon['Type 2'] = replace_nan_strings(pokemon, 'Type 2')

In [102]:
pokemon['Combo Type'] = np.where(pokemon['Type 2'] == '',pokemon['Type 1'],  pokemon['Type 1']+ '-' + pokemon['Type 2'])
pokemon['Combo Type']

0         Grass-Poison
1         Grass-Poison
2         Grass-Poison
3         Grass-Poison
4                 Fire
5                 Fire
6          Fire-Flying
7          Fire-Dragon
8          Fire-Flying
9                Water
10               Water
11               Water
12               Water
13                 Bug
14                 Bug
15          Bug-Flying
16          Bug-Poison
17          Bug-Poison
18          Bug-Poison
19          Bug-Poison
20       Normal-Flying
21       Normal-Flying
22       Normal-Flying
23       Normal-Flying
24              Normal
25              Normal
26       Normal-Flying
27       Normal-Flying
28              Poison
29              Poison
            ...       
770              Fairy
771    Fighting-Flying
772     Electric-Fairy
773         Rock-Fairy
774             Dragon
775             Dragon
776             Dragon
777        Steel-Fairy
778        Ghost-Grass
779        Ghost-Grass
780        Ghost-Grass
781        Ghost-Grass
782        

### Identify the pokemons whose `A/D Ratio` are among the top 5 

In [110]:
top_five_pokemon_at_ad_ratio = pokemon_by_ratio.iloc[0:5]
top_five_pokemon_at_ad_ratio

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True,9.0
347,318,Carvanha,Water,Dark,305,45,90,20,65,20,65,3,False,4.5
19,15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False,3.75
453,408,Cranidos,Rock,,350,67,125,40,30,30,58,4,False,3.125
348,319,Sharpedo,Water,Dark,460,70,120,40,95,40,95,3,False,3.0


### For the 5 pokemons printed above, aggregate `Combo Type` and use a list to store the unique values.

Your end product is a list containing the distinct `Combo Type` values of the 5 pokemons with the highest `A/D Ratio`.

In [128]:
top_five_pokemon_at_ad_ratio['Type 2'] = replace_nan_strings(top_five_pokemon_at_ad_ratio, 'Type 2')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [129]:
top_five_pokemon_at_ad_ratio['Combo Type'] = np.where(top_five_pokemon_at_ad_ratio['Type 2'] == '', top_five_pokemon_at_ad_ratio['Type 1'], top_five_pokemon_at_ad_ratio['Type 1']+ '-' + top_five_pokemon_at_ad_ratio['Type 2'])
top_five_pokemon_at_ad_ratio['Combo Type']




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


429       Psychic
347    Water-Dark
19     Bug-Poison
453          Rock
348    Water-Dark
Name: Combo Type, dtype: object

In [117]:
highest_combo_types = top_five_pokemon_at_ad_ratio['Combo Type'].unique()
highest_combo_types

array(['Psychic', 'Water-Dark', 'Bug-Poison', 'Rock'], dtype=object)

### For each of the `Combo Type` values obtained from the previous question, calculate the mean scores of all numeric fields across all pokemons.

Your output should look like below:

![Aggregate](aggregated-mean.png)

In [135]:
greatest = pokemon[pokemon['Combo Type'].isin(highest_combo_types)]
greatest

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio,Combo Type
16,13,Weedle,Bug,Poison,195,40,35,30,20,20,50,1,False,1.166667,Bug-Poison
17,14,Kakuna,Bug,Poison,205,45,25,50,25,25,35,1,False,0.500000,Bug-Poison
18,15,Beedrill,Bug,Poison,395,65,90,40,45,80,75,1,False,2.250000,Bug-Poison
19,15,Mega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False,3.750000,Bug-Poison
53,48,Venonat,Bug,Poison,305,60,55,50,40,55,45,1,False,1.100000,Bug-Poison
54,49,Venomoth,Bug,Poison,450,70,65,60,90,75,90,1,False,1.083333,Bug-Poison
68,63,Abra,Psychic,,310,25,20,15,105,55,90,1,False,1.333333,Psychic
69,64,Kadabra,Psychic,,400,40,35,30,120,70,105,1,False,1.166667,Psychic
70,65,Alakazam,Psychic,,500,55,50,45,135,95,120,1,False,1.111111,Psychic
71,65,Mega Alakazam,Psychic,,590,55,50,65,175,95,150,1,False,0.769231,Psychic


In [124]:
greatest_stats = greatest.groupby(['Combo Type']).mean()
greatest_stats

Unnamed: 0_level_0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,A/D Ratio
Combo Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Bug-Poison,199.166667,347.916667,53.75,68.333333,58.083333,42.5,59.333333,65.916667,2.333333,0.0,1.315989
Psychic,381.973684,464.552632,72.552632,64.947368,67.236842,98.552632,82.394737,78.868421,3.342105,0.236842,1.164196
Rock,410.111111,409.444444,67.111111,103.333333,107.222222,40.555556,58.333333,32.888889,3.888889,0.111111,1.260091
Water-Dark,347.666667,493.833333,69.166667,120.0,65.166667,88.833333,63.5,87.166667,3.166667,0.0,2.291949
