## Testing the Equality of Variances with F-Test
Let $X_1, \ldots, X_n$ and $Y_1, \ldots, Y_m$ be independent and identically distributed samples from two populations, each with a normal distribution. The expected values for the two populations can be different, and the hypothesis to be tested is that the variances are equal.


**Assumptions:**
- The data in each group or dataset should be a simple random sample from a larger population.
- The populations from which the samples are drawn should be approximately normally distributed.

**Mathematical Formulation:**

Let  
$$\overline{X} = \frac{1}{n} \sum_{i=1}^{n}X_i$$
$$\overline{Y} = \frac{1}{m} \sum_{i=1}^{m}Y_i$$  
be the sample means.

And 
$$s_X^2 = \frac{1}{n-1} \sum_{i=1}^{n}(X_i - \overline{X})^2$$
$$s_Y^2 = \frac{1}{m-1} \sum_{i=1}^{m}(Y_i - \overline{Y})^2$$ 
be the sample variances.

Then the test statistic is given by:

$$ F = \frac{s_X^2}{s_Y^2} $$

The F-test is a common method to compare variances. The test statistic (F-statistic) is calculated as the ratio of variances between groups to variances within groups. The formula for the F-statistic is given by:


Where:
- $ s_X^2 $ is the sample variance of the first group.
- $ s_Y^2 $ is the sample variance of the second group.

The F-statistic follows an F-distribution with n-1 and m-1 degrees of freedom. The hypothesis for the F-test is stated as follows:

- **Null Hypothesis ($H_0$):** The variances are equal across all groups or datasets.
- **Alternative Hypothesis ($H_1$):** The variances are not equal across all groups or datasets.

**Decision Rule:**
If the p-value associated with the F-statistic is less than the chosen significance level (commonly 0.05), we reject the null hypothesis, indicating evidence of unequal variances.

**Note:** The F-test is sensitive to departures from normality, and in some cases, alternative tests like Levene's test or Bartlett's test may be considered.




### Example:

In [1]:
import numpy as np
from scipy import stats

In [12]:
mu_1 = 10
mu_2 = 10.5

sigma_1 = 2
sigma_2 = 3

n_1 = 100
n_2 = 120

x_1 = stats.norm.rvs(size=n_1, loc=mu_1, scale=sigma_1)
x_2 = stats.norm.rvs(size=n_2, loc=mu_2, scale=sigma_2)

In [13]:
x1_hat = np.mean(x_1)
x2_hat = np.mean(x_2)
s_x1 = np.std(x_1, ddof=1)
s_x2 = np.std(x_2, ddof=1)
n = len(x_1)
m = len(x_2)
alpha = 0.05

if s_x1 > s_x2:
    f_statistic = s_x1/s_x2
    df_X = n-1
    df_Y = m-1
else:
    f_statistic = s_x2/s_x1
    df_X = m-1
    df_Y = n-1
# Calculate p-value for two-sided F-test
p_value = 2 * (1 - stats.f.cdf(f_statistic, df_X, df_Y))
f_critical = stats.f.ppf(1-alpha/2, df_X, df_Y)

print(f'f-statistic: {f_statistic: 3.4f} \np-value: {p_value: 3.4f} \nz-critical: {f_critical: 3.4f}')
print('Reject H0:', p_value < alpha)

f-statistic:  1.7439 
p-value:  0.0046 
z-critical:  1.4658
Reject H0: True
