# Bonus Challenge 1 - T-test

In statistics, t-test is used to test if two data samples have a significant difference between their means. There are two types of t-test:

* **Student's t-test** (a.k.a. independent or uncorrelated t-test). This type of t-test is to compare the samples of **two independent populations** (e.g. test scores of students in two different classes). `scipy` provides the [`ttest_ind`](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_ind.html) method to conduct student's t-test.

* **Paired t-test** (a.k.a. dependent or correlated t-test). This type of t-test is to compare the samples of **the same population** (e.g. scores of different tests of students in the same class). `scipy` provides the [`ttest_re`](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_rel.html) method to conduct paired t-test.

Both types of t-tests return a number which is called the **p-value**. If p-value is below 0.05, we can confidently declare the null-hypothesis is rejected and the difference is significant. If p-value is between 0.05 and 0.1, we may also declare the null-hypothesis is rejected but we are not highly confident. If p-value is above 0.1 we do not reject the null-hypothesis.

Read more about the t-test in [this article](http://b.link/test50) and [this Quora](http://b.link/unpaired97). Make sure you understand when to use which type of t-test. 

In [1]:
# Import libraries
import numpy as np
import scipy as sp
import pandas as pd
from scipy import stats
from scipy.stats import ttest_rel
from scipy.stats import ttest_ind

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


#### Import dataset

In this challenge we will work on the Pokemon dataset you have already used. The goal is to test whether different groups of pokemon (e.g. Legendary vs Normal, Generation 1 vs 2, single-type vs dual-type) have different stats (e.g. HP, Attack, Defense, etc.). Use Ironhack's database to load the data (db: pokemon, table: pokemon_stats). 

In [2]:
# Your code here:
pokemon = pd.read_csv("/content/drive/MyDrive/[01] Data Analytics - IronHack/[06] Courses/Week 5/Day 21 - Monday/[LAB 7] - Hypothesis Testing/Pokemon.csv")

#### First we want to define a function with which we can test the means of a feature set of two samples. 

In the next cell you'll see the annotations of the Python function that explains what this function does and its arguments and returned value. This type of annotation is called **docstring** which is a convention used among Python developers. The docstring convention allows developers to write consistent tech documentations for their codes so that others can read. It also allows some websites to automatically parse the docstrings and display user-friendly documentations.

Follow the specifications of the docstring and complete the function.

In [3]:
def t_test_features(s1, s2, features=['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Total']):
    """Test means of a feature set of two samples
    
    Args:
        s1 (dataframe): sample 1
        s2 (dataframe): sample 2
        features (list): an array of features to test
    
    Returns:
        dict: a dictionary of t-test scores for each feature where the feature name is the key and the p-value is the value
    """
    results = {}

    # Your code here
    for feature in features:
        t_stat, p_value = ttest_ind(s1[feature], s2[feature])
        results[feature] = p_value

    return results

#### Using the `t_test_features` function, conduct t-test for Lengendary vs non-Legendary pokemons.

*Hint: your output should look like below:*

```
{'HP': 1.0026911708035284e-13,
 'Attack': 2.520372449236646e-16,
 'Defense': 4.8269984949193316e-11,
 'Sp. Atk': 1.5514614112239812e-21,
 'Sp. Def': 2.2949327864052826e-15,
 'Speed': 1.049016311882451e-18,
 'Total': 9.357954335957446e-47}
 ```

In [4]:
# Your code here

# separate Legendary and non-Legendary Pokemon
legendary = pokemon[pokemon["Legendary"] == True]
nonLegendary = pokemon[pokemon["Legendary"] == False]

# t-test for Legendary vs. non-Legendary Pokemon using the t_test_features function
t_test_results = t_test_features(legendary, nonLegendary)

print(t_test_results)

{'HP': 3.330647684846191e-15, 'Attack': 7.827253003205333e-24, 'Defense': 1.5842226094427255e-12, 'Sp. Atk': 6.314915770427266e-41, 'Sp. Def': 1.8439809580409594e-26, 'Speed': 2.3540754436898437e-21, 'Total': 3.0952457469652825e-52}


#### From the test results above, what conclusion can you make? Do Legendary and non-Legendary pokemons have significantly different stats on each feature?

In [5]:
# Your comment here
alpha = 0.05

for feature, p_value in t_test_results.items():
    if p_value < alpha:
        print(f"{feature}: The difference is statistically significant. The null hypothesis is rejected (p-value = {p_value}).\n")
    else:
        print(f"{feature}: The difference is not statistically significant. The null hypothesis is not rejected (p-value = {p_value}).\n")

HP: The difference is statistically significant. The null hypothesis is rejected (p-value = 3.330647684846191e-15).

Attack: The difference is statistically significant. The null hypothesis is rejected (p-value = 7.827253003205333e-24).

Defense: The difference is statistically significant. The null hypothesis is rejected (p-value = 1.5842226094427255e-12).

Sp. Atk: The difference is statistically significant. The null hypothesis is rejected (p-value = 6.314915770427266e-41).

Sp. Def: The difference is statistically significant. The null hypothesis is rejected (p-value = 1.8439809580409594e-26).

Speed: The difference is statistically significant. The null hypothesis is rejected (p-value = 2.3540754436898437e-21).

Total: The difference is statistically significant. The null hypothesis is rejected (p-value = 3.0952457469652825e-52).



We are testing whether there is a significant difference between the stats (HP, Attack, Defense, Sp. Atk, Sp. Def, Speed, and Total) of Legendary and non-Legendary Pokemon.

The null hypothesis for each feature is that there is no significant difference in the means of the feature between the two groups (Legendary and non-Legendary Pokemon). In other words, we assume that the average stat for each feature is the same for both Legendary and non-Legendary Pokemon.

By performing the t-test and analyzing the p-values, we aim to determine if we can reject the null hypothesis and conclude that there is a significant difference between the stats of Legendary and non-Legendary Pokemon.

Based on the provided t-test results, all p-values are much smaller than 0.05, which is the common threshold for statistical significance.

This means that we can confidently **reject the null hypothesis** for each feature and conclude that there is a significant difference between the stats of Legendary and non-Legendary Pokemon for all features (HP, Attack, Defense, Sp. Atk, Sp. Def, Speed, and Total).

In summary, Legendary and non-Legendary Pokemon have significantly different stats for each feature tested.

#### Next, conduct t-test for Generation 1 and Generation 2 pokemons.

In [6]:
# Your code here
gen1 = pokemon[pokemon["Generation"] == 1]
gen2 = pokemon[pokemon["Generation"] == 2]

# t-test using the t_test_features function
t_test_results_gen1_gen2 = t_test_features(gen1, gen2)

print(t_test_results_gen1_gen2)

{'HP': 0.13791881412813622, 'Attack': 0.24050968418101457, 'Defense': 0.5407630349194362, 'Sp. Atk': 0.14119788176331508, 'Sp. Def': 0.16781226231606386, 'Speed': 0.0028356954812578704, 'Total': 0.5599140649014442}


#### What conclusions can you make?

In [7]:
# Your comment here
t_test_results_gen1_gen2 = {"HP": 0.13791881412813622, "Attack": 0.24050968418101457, "Defense": 0.5407630349194362, "Sp. Atk": 0.14119788176331508, "Sp. Def": 0.16781226231606386, "Speed": 0.0028356954812578704, "Total": 0.5599140649014442}

for feature, p_value in t_test_results_gen1_gen2.items():
    if p_value < 0.05:
        print(f"{feature}: p-value = {p_value:.4f} is less than 0.05, so we can reject the null hypothesis. There is a significant difference in {feature} between Generation 1 and Generation 2 Pokemon.\n")
    else:
        print(f"{feature}: p-value = {p_value:.4f} is greater than 0.05, so we cannot reject the null hypothesis. There is no significant difference in {feature} between Generation 1 and Generation 2 Pokemon.\n")

HP: p-value = 0.1379 is greater than 0.05, so we cannot reject the null hypothesis. There is no significant difference in HP between Generation 1 and Generation 2 Pokemon.

Attack: p-value = 0.2405 is greater than 0.05, so we cannot reject the null hypothesis. There is no significant difference in Attack between Generation 1 and Generation 2 Pokemon.

Defense: p-value = 0.5408 is greater than 0.05, so we cannot reject the null hypothesis. There is no significant difference in Defense between Generation 1 and Generation 2 Pokemon.

Sp. Atk: p-value = 0.1412 is greater than 0.05, so we cannot reject the null hypothesis. There is no significant difference in Sp. Atk between Generation 1 and Generation 2 Pokemon.

Sp. Def: p-value = 0.1678 is greater than 0.05, so we cannot reject the null hypothesis. There is no significant difference in Sp. Def between Generation 1 and Generation 2 Pokemon.

Speed: p-value = 0.0028 is less than 0.05, so we can reject the null hypothesis. There is a signi

In summary, only the Speed stat shows a significant difference between Generation 1 and Generation 2 Pokemon. For the other stats (HP, Attack, Defense, Sp. Atk, Sp. Def, and Total), there is no significant difference between the two generations.

#### Compare pokemons who have single type vs those having two types.

In [13]:
# Your code here
single_type_pokemon = pokemon[pokemon["Type 2"].isnull()]
dual_type_pokemon = pokemon[pokemon["Type 2"].notnull()]

t_test_results_single_dual = t_test_features(single_type_pokemon, dual_type_pokemon)
print(t_test_results_single_dual)

{'HP': 0.11060643144431842, 'Attack': 0.00015741395666164396, 'Defense': 3.250594205757004e-08, 'Sp. Atk': 0.0001454917404035147, 'Sp. Def': 0.00010893304795534396, 'Speed': 0.024051410794037463, 'Total': 1.1749035008828752e-07}


#### What conclusions can you make?

In [12]:
# Your comment here
for feature, p_value in t_test_results_single_dual.items():
    if p_value < 0.05:
        print(f"\n{feature}: p-value = {p_value:.4f} is less than 0.05, so we can reject the null hypothesis. There is a significant difference in {feature} between single-type and dual-type Pokemon.")
    else:
        print(f"\n{feature}: p-value = {p_value:.4f} is greater than 0.05, so we cannot reject the null hypothesis. There is no significant difference in {feature} between single-type and dual-type Pokemon.")


HP: p-value = 0.1106 is greater than 0.05, so we cannot reject the null hypothesis. There is no significant difference in HP between single-type and dual-type Pokemon.

Attack: p-value = 0.0002 is less than 0.05, so we can reject the null hypothesis. There is a significant difference in Attack between single-type and dual-type Pokemon.

Defense: p-value = 0.0000 is less than 0.05, so we can reject the null hypothesis. There is a significant difference in Defense between single-type and dual-type Pokemon.

Sp. Atk: p-value = 0.0001 is less than 0.05, so we can reject the null hypothesis. There is a significant difference in Sp. Atk between single-type and dual-type Pokemon.

Sp. Def: p-value = 0.0001 is less than 0.05, so we can reject the null hypothesis. There is a significant difference in Sp. Def between single-type and dual-type Pokemon.

Speed: p-value = 0.0241 is less than 0.05, so we can reject the null hypothesis. There is a significant difference in Speed between single-type 

In conclusion, there is a significant difference in Attack, Defense, Sp. Atk, Sp. Def, Speed, and Total stats between single-type and dual-type Pokemon. However, there is no significant difference in HP between single-type and dual-type Pokemon.

These results suggest that single-type and dual-type Pokemon have statistically significant differences in most of their stats, except for HP. In other words, dual-type Pokemon tend to have different Attack, Defense, Special Attack, Special Defense, Speed, and Total stats compared to single-type Pokemon. 

It's important to note that these results only show that there is a significant difference between the two groups, but they don't provide information about which group is better or has higher stats. To determine that, you would need to analyze the means of the stats for each group. 

The lack of a significant difference in HP indicates that there's no strong evidence to suggest that one group has consistently higher or lower HP than the other. Both single-type and dual-type Pokemon seem to have similar HP values on average.

Overall, these findings could be valuable for Pokemon trainers and players, as they might consider choosing dual-type Pokemon for their team if they value diversity in stats, while keeping in mind that HP differences may not be a major factor in their decision.

Let's try to analyze the means of the stats for each group (single-type and dual-type Pokemon), cause why not..

In [14]:
single_type_mean = single_type_pokemon[["HP", "Attack", "Defense", "Sp. Atk", "Sp. Def", "Speed", "Total"]].mean()
dual_type_mean = dual_type_pokemon[["HP", "Attack", "Defense", "Sp. Atk", "Sp. Def", "Speed", "Total"]].mean()

mean_comparison = pd.DataFrame({"Single_Type": single_type_mean, "Dual_Type": dual_type_mean})
mean_comparison

Unnamed: 0,Single_Type,Dual_Type
HP,67.766839,70.649758
Attack,74.525907,83.173913
Defense,67.585492,79.676329
Sp. Atk,68.284974,77.048309
Sp. Def,67.974093,75.565217
Speed,65.878238,70.514493
Total,412.015544,456.628019


Based on the analysis, it can be concluded that dual-type Pokemon, on average, have higher stats in Attack, Defense, Sp. Atk, Sp. Def, Speed, and Total compared to single-type Pokemon. This suggests that dual-type Pokemon may generally be stronger or more versatile than single-type Pokemon, at least in terms of their stats.

However, it's important to remember that this analysis is based on the average values, and individual Pokemon may still have varying stats within each group.

#### Now, we want to compare whether there are significant differences of `Attack` vs `Defense`  and  `Sp. Atk` vs `Sp. Def` of all pokemons. Please write your code below.

*Hint: are you comparing different populations or the same population?*

In [16]:
# Your code here
attack_defense_pvalue = ttest_rel(pokemon['Attack'], pokemon['Defense']).pvalue
sp_atk_sp_def_pvalue = ttest_rel(pokemon['Sp. Atk'], pokemon['Sp. Def']).pvalue

print(f"Attack vs Defense p-value: {attack_defense_pvalue}")
print(f"Sp. Atk vs Sp. Def p-value: {sp_atk_sp_def_pvalue}")

Attack vs Defense p-value: 1.7140303479358558e-05
Sp. Atk vs Sp. Def p-value: 0.3933685997548122


#### What conclusions can you make?

In [18]:
# Your comment here
alpha = 0.05

if attack_defense_pvalue < alpha:
    print("There is a significant difference between Attack and Defense stats.")
else:
    print("There is no significant difference between Attack and Defense stats.")

if sp_atk_sp_def_pvalue < alpha:
    print("There is a significant difference between Sp. Atk and Sp. Def stats.")
else:
    print("There is no significant difference between Sp. Atk and Sp. Def stats.")

There is a significant difference between Attack and Defense stats.
There is no significant difference between Sp. Atk and Sp. Def stats.


1. Attack vs Defense: The p-value is 1.71e-05, which is less than 0.05. We can reject the null hypothesis, meaning there is a significant difference between the Attack and Defense stats of all Pokemon.

> The results show that there is a significant difference between Attack and Defense stats, suggesting that some Pokemon may excel in Attack but have lower Defense or vice versa.

2. Sp. Atk vs Sp. Def: The p-value is 0.3934, which is greater than 0.05. We cannot reject the null hypothesis, meaning there is no significant difference between the Sp. Atk and Sp. Def stats of all Pokemon.

> On the other hand, there is no significant difference between Sp. Atk and Sp. Def stats, implying that Pokemon may generally have more balanced Special Attack and Special Defense stats.

In summary, there is a significant difference between Attack and Defense stats, while there is no significant difference between Sp. Atk and Sp. Def stats for all Pokemon.
