#  ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](http://b.link/anova24).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](http://b.link/eda14) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA. Use Ironhack's database to load the pokemon data (db: pokemon, table: pokemon_stats). 

In [2]:
# Import libraries
import pandas as pd
import numpy as np
import scipy.stats as st

In [3]:
# Load the data:
data = pd.read_csv('pokemon.csv')
data

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [4]:
def combo(value):
    type1 = value['Type 1'];
    type2 = value['Type 2']
    if str(type2) == 'nan':
        return type1
    else:
        return type1 +'-'+ type2
    
data['Combo Types'] = data.apply(combo, axis=1)
#data['Combo Types'] = data['Combo Types'].str.split('-')


In [None]:
#unique_types = []
#for x in data['Type 1']:
#    if x not in unique_types:
#        unique_types.append(x)
#
#len(unique_types)

In [None]:
#We concluded that it makes more sense to look at all combos instead of the single types 

In [5]:
unique_types_combo = []
for x in data['Combo Types']:
    if x not in unique_types_combo:
        unique_types_combo.append(x)

unique_types_combo

['Grass-Poison',
 'Fire',
 'Fire-Flying',
 'Fire-Dragon',
 'Water',
 'Bug',
 'Bug-Flying',
 'Bug-Poison',
 'Normal-Flying',
 'Normal',
 'Poison',
 'Electric',
 'Ground',
 'Poison-Ground',
 'Fairy',
 'Normal-Fairy',
 'Poison-Flying',
 'Bug-Grass',
 'Fighting',
 'Water-Fighting',
 'Psychic',
 'Water-Poison',
 'Rock-Ground',
 'Water-Psychic',
 'Electric-Steel',
 'Water-Ice',
 'Ghost-Poison',
 'Grass-Psychic',
 'Ground-Rock',
 'Grass',
 'Psychic-Fairy',
 'Ice-Psychic',
 'Water-Flying',
 'Water-Dark',
 'Rock-Water',
 'Rock-Flying',
 'Ice-Flying',
 'Electric-Flying',
 'Dragon',
 'Dragon-Flying',
 'Psychic-Fighting',
 'Water-Electric',
 'Fairy-Flying',
 'Psychic-Flying',
 'Electric-Dragon',
 'Water-Fairy',
 'Rock',
 'Grass-Flying',
 'Water-Ground',
 'Dark',
 'Dark-Flying',
 'Ghost',
 'Normal-Psychic',
 'Bug-Steel',
 'Ground-Flying',
 'Steel-Ground',
 'Bug-Rock',
 'Bug-Fighting',
 'Dark-Ice',
 'Fire-Rock',
 'Ice-Ground',
 'Water-Rock',
 'Steel-Flying',
 'Dark-Fire',
 'Water-Dragon',
 'Rock-Dar

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [6]:
len(data)

800

In [7]:
data['Total']

0      318
1      405
2      525
3      625
4      309
      ... 
795    600
796    700
797    600
798    680
799    600
Name: Total, Length: 800, dtype: int64

In [None]:
#pokemon_totals=[]
#for x in unique_types_combo:
#    total=0
#    for i in range(0,len(data)):
#        #print(i)
#        if data['Combo Types'][i]==x:
#            total+=data['Total'][i]
#    pokemon_totals.append(total)
#            
#    
#pokemon_totals


In [8]:

list_of_totals=[]

for x in unique_types_combo:
    pokemon_totals=[]
    for i in range(0,len(data)):
        if data['Combo Types'][i]==x:
            pokemon_totals.append(data['Total'][i])
    list_of_totals.append(pokemon_totals)

            
    
list_of_totals


[[318, 405, 525, 625, 320, 395, 490, 300, 390, 490, 400, 280, 515, 294, 464],
 [309,
  405,
  299,
  505,
  350,
  555,
  410,
  500,
  495,
  525,
  309,
  405,
  534,
  250,
  365,
  580,
  310,
  470,
  309,
  540,
  308,
  316,
  498,
  315,
  480,
  484,
  307,
  409],
 [534, 634, 580, 680, 382, 499],
 [634],
 [314,
  405,
  530,
  630,
  320,
  500,
  300,
  385,
  325,
  305,
  325,
  475,
  295,
  440,
  320,
  450,
  340,
  200,
  525,
  314,
  405,
  530,
  500,
  300,
  480,
  580,
  310,
  400,
  500,
  308,
  200,
  540,
  345,
  485,
  485,
  330,
  670,
  770,
  314,
  405,
  330,
  495,
  325,
  330,
  460,
  480,
  600,
  308,
  413,
  528,
  316,
  498,
  294,
  460,
  470,
  314,
  405,
  330,
  500],
 [195,
  205,
  500,
  290,
  195,
  205,
  205,
  400,
  400,
  194,
  384,
  224,
  315,
  305,
  495,
  200,
  213],
 [395, 500, 600, 265, 390, 390, 395, 414, 456, 424, 244, 474, 515, 411],
 [195, 205, 395, 495, 305, 450, 250, 390, 385, 260, 360, 485],
 [251,
  349,


In [None]:
#poke_data={'type': pokemon_group, 'totals': pokemon_totals}
#poke_data=pd.DataFrame(poke_data)
#poke_data

In [None]:
#new = {'type': unique_types_combo, 'totals': pokemon_totals}
#new = pd.DataFrame(new)
#new

In [None]:
#len(pokemon_totals)

In [None]:
#data[data['Combo Types']=='Grass-Poison']['Total'].sum()

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](http://b.link/scipy44).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

In [9]:
st.f_oneway(*list_of_totals)

F_onewayResult(statistic=2.0538240752912196, pvalue=5.865343391415285e-10)

#### Interpret the ANOVA test result. Is the difference significant?

In [None]:
# H0 for ANOVA is always that the means of the various groups are the same
# H1 is that they are not the same


In [None]:
# This means with the very low p value, that we have to reject the H0 hypothesis, so we have reason to believe, that the means
# of the total stats of the different pokemon is significantly different.