# Challenge - ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](https://keydifferences.com/difference-between-t-test-and-anova.html).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](https://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA.

En estadísticas, el análisis de varianza (ANOVA) también se utiliza para analizar las diferencias entre las medias grupales. La diferencia entre la prueba t y el ANOVA es que el primero se utiliza para comparar dos grupos, mientras que el segundo se usa para comparar tres o más grupos.

In [1]:
# Import libraries

import pandas as pd

In [6]:
# Import dataset

pokemon = pd.read_csv('../Pokemon.csv')

pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [17]:
# Your code here
tipo1 = len(pokemon['Type 1'].unique())

18

In [19]:
tipo2 = len(pokemon['Type 2'].unique())

19

In [22]:
pokemon['Type 1'].unique()

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying'], dtype=object)

In [24]:
unique_types = pokemon['Type 2'].unique()

In [25]:
len(unique_types) # you should see 19

19

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [47]:
pokemon_totals = []

# Your code here

for pokemon_type in unique_types:
    if isinstance(pokemon_type, str):  # Descartar el valor "nan"
        total = pokemon[(pokemon['Type 2'] == pokemon_type)|(pokemon['Type 1'] == pokemon_type)]['Total']
        pokemon_totals.append(total)

len(pokemon_totals) # you should see 18

18

In [41]:
pokemon_totals

[28     288
 29     438
 34     275
 35     365
 37     273
 38     365
 95     325
 96     500
 117    340
 118    490
 345    302
 346    467
 368    458
 629    329
 630    474
 Name: Total, dtype: int64,
 702    580
 703    580
 Name: Total, dtype: int64,
 159    300
 160    420
 406    300
 407    420
 671    320
 672    410
 673    540
 682    485
 774    300
 775    452
 776    600
 Name: Total, dtype: int64,
 32     300
 33     450
 55     265
 56     405
 112    320
 113    425
 250    330
 251    500
 359    290
 423    670
 499    330
 500    525
 588    328
 Name: Total, dtype: int64,
 40     323
 41     483
 187    218
 189    245
 225    300
 226    450
 737    303
 738    371
 739    552
 752    341
 753    462
 754    341
 755    480
 770    525
 792    680
 Name: Total, dtype: int64,
 122    435
 166    318
 167    405
 168    525
 197    490
 206    180
 207    425
 272    310
 273    405
 274    530
 296    220
 309    295
 362    335
 432    318
 433    405
 467    

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [48]:
# Your code here
from scipy.stats import f_oneway
import numpy as np


f_value, p_value = f_oneway(*pokemon_totals)

# Imprimir los resultados
print("Valor F de la prueba ANOVA:", f_value)
print("Valor p de la prueba ANOVA:", p_value)

Valor F de la prueba ANOVA: 6.617538296005537
Valor p de la prueba ANOVA: 2.6457458815984803e-15


#### Interpret the ANOVA test result. Is the difference significant?

Un valor F alto indica que hay mayor variación entre los grupos en comparación con la variación dentro de los grupos. Si el valor p es menor que un nivel de significancia predefinido (por ejemplo, 0.05), se considera que hay diferencias significativas entre al menos algunos de los grupos.

En este caso, dado que el valor p es extremadamente pequeño (2.6457e-15), se puede concluir que hay diferencias significativas entre los diferentes tipos de Pokémon en términos de sus totales. Esto implica que al menos algunos de los tipos de Pokémon tienen totales diferentes y no se deben atribuir simplemente a la variación aleatoria.

