# 1) Explain the properties of the F-distribution 

The F-distribution is a probability distribution used in statistical tests to compare variances or evaluate multiple group means. Key properties include:

Shape: It is asymmetric, skewed to the right, and becomes more symmetric as degrees of freedom increase.
Parameters: It depends on two degrees of freedom (
𝑑
𝑓
1
df 
1
​
  for the numerator, 
𝑑
𝑓
2
df 
2
​
  for the denominator).
Range: Values are always positive, starting from 0 to 
∞
∞.
Application: Used in analysis of variance (ANOVA) and F-tests to test hypotheses involving ratios of variances.


# 2) In which types of statistical tests is the F-distribution used, and why is it appropriate for these tests?

The F-distribution is used in:

ANOVA: To compare means of more than two groups by analyzing variance ratios.
F-tests: To test if two population variances are equal.
Regression Analysis: To test the significance of the overall model by comparing explained vs. unexplained variances.
Appropriateness: The F-distribution models the ratio of variances, making it ideal for hypothesis testing in situations where variances or variance-based metrics are compared.

# 3) What are the key assumptions required for conducting an F-test to compare the variances of two populations?

Key assumptions include:

Normality: Both populations should follow a normal distribution.
Independence: The two samples must be independent of each other.
Random Sampling: Data must be collected randomly.
No Outliers: Outliers can heavily affect variance estimates and distort results.




# 4) What is the purpose of ANOVA, and how does it differ from a t-test?


Purpose of ANOVA: ANOVA (Analysis of Variance) tests for statistically significant differences between means of three or more groups by comparing variances.

Difference:

T-test: Compares means of two groups.
ANOVA: Compares means of three or more groups simultaneously, reducing Type I error compared to multiple t-tests.


# 5) Explain when and why you would use a one-way ANOVA instead of multiple t-tests when comparing more than two groups.

A one-way ANOVA is used when comparing means of three or more groups with a single independent variable.

Why not multiple t-tests?: Conducting multiple t-tests increases the risk of Type I error (false positives).
Advantage of ANOVA: It evaluates all groups simultaneously, maintaining overall error rates and providing a single F-statistic.


# 6) Explain how variance is partitioned in ANOVA into between-group variance and within-group variance. How does this partitioning contribute to the calculation of the F-statistic? 

ANOVA partitions total variance into:

Between-group variance: Variance due to differences between group means (SSbetween) 
Within-group variance: Variance due to differences within each group (
SSwithin ).
F-statistic:

𝐹
= Between-group mean square (MS_between)/
Within-group mean square (MS_within)

 
Where:
 MSbetween= SSbetween/d fbetween
 MSwithin = SSwithin/d fwihtin
larger F indicates greater differences between group means compared to within-group variability.

# 7) Compare the classical (frequentist) approach to ANOVA with the Bayesian approach. 

Frequentist ANOVA:

Uses point estimates and p-values to reject or fail to reject the null hypothesis.
Assumes fixed parameters.
Provides no direct probability of the null hypothesis being true.
Bayesian ANOVA:

Incorporates prior beliefs and updates them using data to obtain posterior distributions.
Provides a probabilistic estimate of parameters and hypotheses.
Handles uncertainty better but is computationally intensive.

# 8)  F-test for comparing variances of incomes between two professions (Python Implementation)

In [3]:
import scipy.stats as stats

# Data
profession_a = [48, 52, 55, 60, 62]
profession_b = [45, 50, 55, 52, 47]

# Variances
var_a = stats.variation(profession_a)
var_b = stats.variation(profession_b)

# F-statistic
f_stat = var_a / var_b

# Degrees of freedom
df1 = len(profession_a) - 1
df2 = len(profession_b) - 1

# P-value
p_value = stats.f.cdf(f_stat, df1, df2)

f_stat, p_value

(1.2992917285379597, 0.5970738235121262)

# 9. One-way ANOVA for comparing average heights of three regions (Python Implementation)

Interpretation:

If 
𝑝
<
0.05
p<0.05, there is a significant difference in mean heights among regions.
If 
𝑝
≥
0.05
p≥0.05, there is no significant difference in mean heights.