# Hypothesis Testing with t-Test  
Dataset: Palmer Penguins  

## Hypothesis  
**H₀:** The mean flipper length of Adelie penguins is equal to that of Gentoo penguins.  
**H₁:** The mean flipper length of Adelie penguins is different from that of Gentoo penguins.


In [None]:

import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (8, 5)


In [None]:

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv"
df = pd.read_csv(url)
df = df.dropna().reset_index(drop=True)
df.head()


In [None]:

adelie = df[df["species"] == "Adelie"]["flipper_length_mm"]
gentoo = df[df["species"] == "Gentoo"]["flipper_length_mm"]

adelie.describe(), gentoo.describe()


In [None]:

sns.boxplot(data=df, x="species", y="flipper_length_mm")
plt.title("Flipper Length by Species")
plt.show()


In [None]:

# Independent two-sample t-test (Welch's t-test)
t_stat, p_value = stats.ttest_ind(
    adelie,
    gentoo,
    equal_var=False
)

t_stat, p_value



## Interpretation  

- We use an **independent two-sample t-test** because:
  - The samples are independent (different penguin species)
  - The population variances are not assumed to be equal (Welch’s t-test)

- **Result:**  
  - p-value ≪ 0.05 → reject the null hypothesis  

### Conclusion  
There is a statistically significant difference in mean flipper length between Adelie and Gentoo penguins.  
This result supports biological expectations and confirms that species identity is strongly associated with physical measurements.

## Outlook  
- Extend to ANOVA to compare all three species  
- Use effect size (Cohen’s d)  
- Apply results as motivation for classification models
