#  ANOVA

In statistics, **Analysis of Variance (ANOVA)** is also used to analyze the differences among group means. The difference between t-test and ANOVA is the former is ued to compare two groups whereas the latter is used to compare three or more groups. [Read more about the difference between t-test and ANOVA](http://b.link/anova24).

From the ANOVA test, you receive two numbers. The first number is called the **F-value** which indicates whether your null-hypothesis can be rejected. The critical F-value that rejects the null-hypothesis varies according to the number of total subjects and the number of subject groups in your experiment. In [this table](http://b.link/eda14) you can find the critical values of the F distribution. **If you are confused by the massive F-distribution table, don't worry. Skip F-value for now and study it at a later time. In this challenge you only need to look at the p-value.**

The p-value is another number yielded by ANOVA which already takes the number of total subjects and the number of experiment groups into consideration. **Typically if your p-value is less than 0.05, you can declare the null-hypothesis is rejected.**

In this challenge, we want to understand whether there are significant differences among various types of pokemons' `Total` value, i.e. Grass vs Poison vs Fire vs Dragon... There are many types of pokemons which makes it a perfect use case for ANOVA. 

In [1]:
# Import libraries

import numpy as np
import pandas as pd
from scipy import stats

In [2]:
# Load the data:
df= pd.read_csv('pokemon.txt', sep=',')
df

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


**To achieve our goal, we use three steps:**

1. **Extract the unique values of the pokemon types.**

1. **Select dataframes for each unique pokemon type.**

1. **Conduct ANOVA analysis across the pokemon types.**

#### First let's obtain the unique values of the pokemon types. These values should be extracted from Type 1 and Type 2 aggregated. Assign the unique values to a variable called `unique_types`.

*Hint: the correct number of unique types is 19 including `NaN`. You can disregard `NaN` in next step.*

In [3]:
# Your code here

unique_types = df['Type 1'].append(df['Type 2']).unique()
unique_types



  unique_types = df['Type 1'].append(df['Type 2']).unique()


array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying', nan], dtype=object)

#### Second we will create a list named `pokemon_totals` to contain the `Total` values of each unique type of pokemons.

Why we use a list instead of a dictionary to store the pokemon `Total`? It's because ANOVA only tells us whether there is a significant difference of the group means but does not tell which group(s) are significantly different. Therefore, we don't need know which `Total` belongs to which pokemon type.

*Hints:*

* Loop through `unique_types` and append the selected type's `Total` to `pokemon_groups`. Be sure to loop through BOTH `Type 1` and `Type 2` to cover all occurrances of each unique type.
* Skip the `NaN` value in `unique_types`. `NaN` is a `float` variable which you can find out by using `type()`. The valid pokemon type values are all of the `str` type.
* At the end, the length of your `pokemon_totals` should be 18.

In [4]:
# do we want the total number of pokemons of each type , or the sum of columns Total for each type? i am confused, but i will assume we want the sum of total "Totals" for each type


# Create an empty list to store the Total values
pokemon_totals = []

# Get all unique types from both 'Type 1' and 'Type 2'
all_types = pd.concat([df['Type 1'], df['Type 2']], ignore_index=True).unique()

# Loop through each unique type
for unique_type in all_types:
    if isinstance(unique_type, str):  # Check if the type is a valid string
        type_total = 0  # Initialize the total for the current type to 0
    
        # Loop through 'df' to calculate the total for the current type
        for index, row in df.iterrows():
            if unique_type in [row['Type 1'], row['Type 2']]:
                if not pd.isna(row['Total']):  # Skip NaN values
                    type_total += row['Total']
    
        # Append the total for the current type to the list
        pokemon_totals.append(type_total)

# Combine type designations with the results
results_with_types = [(unique_type, total) for unique_type, total in zip(all_types, pokemon_totals) if isinstance(unique_type, str)]


results_with_types


[('Grass', 39703),
 ('Fire', 29895),
 ('Water', 54066),
 ('Bug', 27326),
 ('Normal', 41011),
 ('Poison', 24657),
 ('Electric', 22242),
 ('Ground', 29552),
 ('Fairy', 16637),
 ('Fighting', 24916),
 ('Psychic', 42938),
 ('Rock', 26050),
 ('Ghost', 20096),
 ('Ice', 17763),
 ('Dragon', 27088),
 ('Dark', 23506),
 ('Steel', 23843),
 ('Flying', 45837)]

#### Now we run ANOVA test on `pokemon_totals`.

*Hints:*

* To conduct ANOVA, you can use `scipy.stats.f_oneway()`. Here's the [reference](http://b.link/scipy44).

* What if `f_oneway` throws an error because it does not accept `pokemon_totals` as a list? The trick is to add a `*` in front of `pokemon_totals`, e.g. `stats.f_oneway(*pokemon_groups)`. This trick breaks the list and supplies each list item as a parameter for `f_oneway`.

Null Hypothesis (H0):
There is no significant difference in the mean Total values between different Pokémon types.

Alternative Hypothesis (Ha):
There is a significant difference in the mean Total values between different Pokémon types.

In [6]:


# Create an empty list to store the Total values for each Pokémon type
pokemon_totals = []

# Get all unique types from both 'Type 1' and 'Type 2'
unique_types = pd.concat([df['Type 1'], df['Type 2']], ignore_index=True).unique()

# Loop through each unique type and create a group of Total values
for unique_type in unique_types:
    if isinstance(unique_type, str):  # Check if the type is a valid string
        type_totals = []  # Initialize a list to store the Total values for the current type

        # Loop through 'df' to gather Total values for the current type
        for index, row in df.iterrows():
            if unique_type in [row['Type 1'], row['Type 2']]:
                if not pd.isna(row['Total']):  # Skip NaN values
                    type_totals.append(row['Total'])

        # Append the list of Total values for the current type to the main list
        pokemon_totals.append(type_totals)

# Perform ANOVA test on the list of Total values for each Pokémon type
f_statistic, p_value = stats.f_oneway(*pokemon_totals)

# Print the results
print("F-Statistic:", f_statistic)
print("P-Value:", p_value)




F-Statistic: 6.6175382960055344
P-Value: 2.6457458815984803e-15


#### Interpret the ANOVA test result. Is the difference significant?

In [7]:
# Your comment here
# Interpret the ANOVA test result. Is the difference significant?
if p_value < 0.05:
    print("The ANOVA test suggests that there is a significant difference in Total values between Pokémon types.")
else:
    print("The ANOVA test does not suggest a significant difference in Total values between Pokémon types.")


The ANOVA test suggests that there is a significant difference in Total values between Pokémon types.


2.6457458815984803e-15 = 0.0000000000000026457458815984803

So, the p-value in standard decimal notation is approximately 0.0000000000000026. This p-value is extremely small, indicating strong evidence against the null hypothesis in the ANOVA test. In practical terms, it suggests that there is a highly significant difference in Total values between Pokémon types.