# Challenge - ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](https://keydifferences.com/difference-between-t-test-and-anova.html).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](https://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA.

In [1]:
# Import libraries

import pandas as pd

In [31]:
# Import dataset

pokemon = pd.read_csv('../Pokemon.csv')

pokemon[pokemon['Type 1']=='Electric'].head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
30,25,Pikachu,Electric,,320,35,55,40,50,50,90,1,False
31,26,Raichu,Electric,,485,60,90,55,90,80,110,1,False
88,81,Magnemite,Electric,Steel,325,25,35,70,95,55,45,1,False
89,82,Magneton,Electric,Steel,465,50,60,95,120,70,70,1,False
108,100,Voltorb,Electric,,330,40,30,50,55,55,100,1,False


**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [25]:
# Your code here

type1=list(pokemon['Type 1'].unique())
type2=list(pokemon['Type 2'].unique())
both=type1+type2
unique_types=list(set(both))

len(unique_types) # you should see 19

19

In [27]:
unique_types.pop(0)


nan

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [None]:
pokemon_totals = []
for i in unique_types:
    pokemon_groups={i: pokemon[(pokemon['Type 1']==i) | (pokemon['Type 2']==i)].Total}
    pokemon_totals.append(pokemon_groups[i])
pokemon_totals

In [87]:
pokemon_totals = []
for i in unique_types:
    temp=[]
    for j in range(len(pokemon)):
        if pokemon['Type 1'][j]==i or pokemon['Type 2'][j]==i:
            temp.append(pokemon.Total[j])
    
    pokemon_totals.append(temp)    
        

In [89]:
len(pokemon_totals)

18

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [90]:
from scipy.stats import f_oneway, norm, f, chi2, chi2_contingency

import statsmodels.api as sm
from statsmodels.formula.api import ols

f_oneway(*pokemon_totals)  


F_onewayResult(statistic=6.617538296005536, pvalue=2.6457458815984803e-15)

#### Interpret the ANOVA test result. Is the difference significant?

In [None]:
# Your comment here