# Challenge - ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](https://keydifferences.com/difference-between-t-test-and-anova.html).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](https://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA.

In [60]:
# Import libraries

import pandas as pd
from scipy.stats import f_oneway
import numpy as np

In [39]:
# Import dataset

pokemon = pd.read_csv('../Pokemon.csv')

pokemon.head()
pokemon.columns = [i.lower().replace('.', '_').replace(' ','') for i in pokemon.columns]
pokemon.head()

Unnamed: 0,#,name,type1,type2,total,hp,attack,defense,sp_atk,sp_def,speed,generation,legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [62]:
unique_types = pd.concat([pokemon['type1'], pokemon['type2']]).unique()
unique_types 
len(unique_types) # you should see 19
unique_types

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying', nan], dtype=object)

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [49]:
pokemon_totals = []
for e in unique_types:
    if not isinstance(e, str):
        continue
    pokemon_groups = pokemon[pokemon['type1'] == e]
    
    total_sum = pokemon_groups['total'].tolist()
    
    pokemon_totals.append(total_sum)
len(pokemon_totals)

18

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [51]:
pokemon_totals = [total for total in pokemon_totals if len(total) > 0]

anova_result = f_oneway(*pokemon_totals)

anova_result


F_onewayResult(statistic=4.63876748166055, pvalue=2.077215448842098e-09)

In [53]:
# Your code here
pokemon_totals

[[318,
  405,
  525,
  625,
  320,
  395,
  490,
  300,
  390,
  490,
  325,
  520,
  435,
  318,
  405,
  525,
  490,
  250,
  340,
  460,
  180,
  425,
  310,
  405,
  530,
  630,
  220,
  340,
  480,
  295,
  460,
  400,
  335,
  475,
  460,
  318,
  405,
  525,
  280,
  515,
  275,
  450,
  454,
  334,
  494,
  594,
  535,
  525,
  600,
  600,
  308,
  413,
  528,
  316,
  498,
  280,
  480,
  280,
  480,
  461,
  294,
  464,
  305,
  489,
  580,
  313,
  405,
  530,
  350,
  531],
 [309,
  405,
  534,
  634,
  634,
  299,
  505,
  350,
  555,
  410,
  500,
  495,
  525,
  580,
  309,
  405,
  534,
  250,
  410,
  365,
  580,
  680,
  310,
  405,
  530,
  630,
  305,
  460,
  560,
  470,
  309,
  405,
  534,
  540,
  600,
  308,
  418,
  528,
  316,
  498,
  315,
  480,
  540,
  484,
  307,
  409,
  534,
  382,
  499,
  369,
  507,
  600],
 [314,
  405,
  530,
  630,
  320,
  500,
  300,
  385,
  510,
  335,
  515,
  315,
  490,
  590,
  325,
  475,
  305,
  525,
  325,
  475,
  29

#### Interpret the ANOVA test result. Is the difference significant?

In [43]:
# Your comment here
Se rechaza H0. Existe una diferencia significativa en al menos una ciudad.

In [71]:
unique_types = [e for e in unique_types if pd.notna(e)]

In [72]:
medias=[]
for i in pokemon_totals:
    medias.append(np.mean(i))
    
dict(zip(medias,unique_types))

{421.14285714285717: 'Grass',
 458.0769230769231: 'Fire',
 430.45535714285717: 'Water',
 378.92753623188406: 'Bug',
 401.68367346938777: 'Normal',
 399.14285714285717: 'Poison',
 443.40909090909093: 'Electric',
 437.5: 'Ground',
 413.1764705882353: 'Fairy',
 416.44444444444446: 'Fighting',
 475.94736842105266: 'Psychic',
 453.75: 'Rock',
 439.5625: 'Ghost',
 433.4583333333333: 'Ice',
 550.53125: 'Dragon',
 445.741935483871: 'Dark',
 487.7037037037037: 'Steel',
 485.0: 'Flying'}

In [None]:
a salivometro la media mas alta es dragon