📘 Statistics Advance - 1: Assignment Solutions
1. Explain the properties of the F-distribution.
Right-skewed: The F-distribution is positively skewed.

Non-negative: F-values are always ≥ 0.

Asymmetry: Not symmetric like the normal distribution.

Dependent on two degrees of freedom (df1, df2): One for the numerator and one for the denominator.

Used to compare variances by dividing two independent chi-squared distributions (scaled).

2. In which types of statistical tests is the F-distribution used, and why is it appropriate for these tests?
Used in:

F-test for comparing two variances

ANOVA (Analysis of Variance)

Regression analysis (to test overall model significance)

Why appropriate?

Because it’s derived from the ratio of two independent sample variances.

Helps assess whether group means are significantly different.

3. Key assumptions for an F-test:
Populations are normally distributed

Samples are independent

Data is quantitative

The test is sensitive to violations of normality, especially in small samples.

4. Purpose of ANOVA vs t-test:
ANOVA checks if more than two group means are different.

t-test compares only two groups.

ANOVA controls for Type I error inflation which occurs if multiple t-tests are performed.

5. When to use one-way ANOVA instead of multiple t-tests:
When comparing 3 or more groups

Using multiple t-tests increases the chance of false positives (Type I errors).

ANOVA keeps α (significance level) consistent and provides a single test.

6. How ANOVA partitions variance:
Total Variance (SST) is split into:

Between-group variance (SSB): Due to differences between group means.

Within-group variance (SSW): Due to variability within each group.

F-statistic Formula:

𝐹
=
MSB
MSW
=
SSB
/
(
𝑘
−
1
)
SSW
/
(
𝑁
−
𝑘
)
F=
MSW
MSB
​
 =
SSW/(N−k)
SSB/(k−1)
​

where k is number of groups, N is total samples.

7. Classical vs Bayesian ANOVA
Aspect	Classical (Frequentist)	Bayesian
Uncertainty	Fixed parameters, probability via sampling	Parameters have distributions
Interpretation	p-value based decision	Posterior probability based
Hypothesis Testing	Reject/fail to reject H₀	Estimate probability of H₀ being true
Output	F-statistic, p-value	Posterior distributions, Bayes factors



In [1]:
import numpy as np
from scipy.stats import f

# Data
a = [48, 52, 55, 60, 62]
b = [45, 50, 55, 52, 47]

# Sample variances
var_a = np.var(a, ddof=1)
var_b = np.var(b, ddof=1)

# F-statistic (larger variance / smaller variance)
F = var_a / var_b if var_a > var_b else var_b / var_a

# Degrees of freedom
df1 = len(a) - 1
df2 = len(b) - 1

# Two-tailed p-value
p = 2 * min(f.cdf(F, df1, df2), 1 - f.cdf(F, df1, df2))

print(f"F-statistic: {F:.3f}")
print(f"P-value: {p:.4f}")


F-statistic: 2.089
P-value: 0.4930


In [2]:
from scipy.stats import f_oneway

# Data
region_a = [160, 162, 165, 158, 164]
region_b = [172, 175, 170, 168, 174]
region_c = [180, 182, 179, 185, 183]

# Perform one-way ANOVA
f_stat, p_val = f_oneway(region_a, region_b, region_c)

print(f"F-Statistic: {f_stat:.3f}")
print(f"P-Value: {p_val:.4f}")


F-Statistic: 67.873
P-Value: 0.0000
