### Real world Case study

Scenario: A botanist is trying to figure out if both types of iris blooms have different sepal length variability. The variation in sepal lengths across plants can reveal variations in genetic diversity, environmental adaption, or measurement stability.

### Dateset 
Downloaded and used the iris.csv dataset. it contains sepal length, sepal width, petal length, petal width, and species for 150 flowers.

### Hypothesis
Null hypothesis (H₀): Variances of sepal length in Setosa and Versicolor are equal.

Alternative hypothesis (Hₐ): Variances of sepal length in Setosa and Versicolor are not equal.

### Method
The F-test was applied by computing the ratio of the larger variance to the smaller variance.
A two-tailed p-value was calculated to determine whether the observed variance ratio could occur by chance or under normal circumstance(null hypothesis)

#### Choosen Test for me by Professor:
#### F-Test Implementation

In [None]:
# Importing Dependencies
import pandas as pd
import numpy as np
from scipy.stats import f

# Load Iris dataset from raw CSV URL
url = "https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/0e7a9b0a5d22642a06d3d5b9bcbad9890c8ee534/iris.csv"
df = pd.read_csv(url)


# Select the groups
g1 = df[df['species'] == "setosa"]['sepal_length']
g2 = df[df['species'] == "versicolor"]['sepal_length']

# Compute sample variances
var1 = np.var(g1, ddof=1)
var2 = np.var(g2, ddof=1)

print(f"Var(setosa) = {var1:.4f}, Var(versicolor) = {var2:.4f}")

# Compute F-statistic (ratio of larger variance / smaller variance)
F_stat = var2 / var1 if var2 > var1 else var1 / var2
df1 = len(g2) - 1 if var2 > var1 else len(g1) - 1
df2 = len(g1) - 1 if var2 > var1 else len(g2) - 1

# Two-tailed p-value
p_value = 2 * min(f.cdf(F_stat, df1, df2), 1 - f.cdf(F_stat, df1, df2))
print(f"Variance Ratio (F) = {F_stat:.4f}, p-value (two-tailed) = {p_value:.4f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject H0: variances differ significantly between Setosa and Versicolor.")
else:
    print("Fail to reject H0: no evidence of different variances between the two species.")


Var(setosa) = 0.1242, Var(versicolor) = 0.2664
Variance Ratio (F) = 2.1443, p-value (two-tailed) = 0.0087
Reject H0: variances differ significantly between Setosa and Versicolor.


#### 💭 Reflection 

---
The F-test result of 2.1443 means that Versicolor has around 2 time the variability in sepal length compard to Setosa.
This shows meaningful differences in spread between species.

The result of the p_value is less than 0.05 which gives us enough evidence to reject the null hypothesis and conclude that there is significant variances in sepal length between Setosa and Versicolor.


---

<br>
<br>
<br>

