1. Explain the properties of the F-distribution. 

Definition: The F-distribution is a continuous probability distribution that arises frequently in analysis of variance (ANOVA) and hypothesis testing.
Skewed Distribution: The F-distribution is positively skewed and bounded at 0 (i.e.,𝐹≥0).

Shape: The shape of the F-distribution depends on two parameters:

#𝑑1(numerator degrees of freedom)

#𝑑2(denominator degrees of freedom).

** Mean of F-Distribution: 𝐸[𝐹]=𝑑2/(d2-2),(𝑑2>2)

** Variance:Var(𝐹)=2𝑑_2^2(𝑑1+𝑑2−2)/(𝑑1(𝑑2−2)**2(𝑑2−4)),(𝑑2>4)

--> Right-Tailed Test: The F-test is primarily used in right-tailed hypothesis tests.

--> Non-Symmetrical: Unlike the normal distribution, the F-distribution is not symmetrical.

--> Additivity: Independent F-distributions can be added under certain conditions.


2. In which types of statistical tests is the F-distribution used, and why is it appropriate for these tests?





Statistical Tests Using the F-Distribution:

Analysis of Variance (ANOVA): Compares means of multiple groups by analyzing variances.

Regression Analysis: Tests for the overall significance of a regression model.

Variance Ratio Test (F-Test): Compares the variances of two populations.

Why Appropriate:

The F-distribution is ideal for comparing variances because it models the ratio of two chi-square distributions (scaled by their degrees of freedom).

This makes it sensitive to differences in variances between groups.

3. What are the key assumptions required for conducting an F-test to compare the variances of two
populations?





Assumptions for Conducting an F-Test (Comparing Variances):

1. Independence: The samples must be independent of each other.

2. Normality: The populations being compared should follow a normal distribution.

3. Random Sampling: Data should be collected through random sampling.

4. Equal Scale: The measurement scale should be the same for both groups.

**Note: The F-test is sensitive to non-normality, so if this assumption is violated, alternative tests (e.g., Levene’s test) may be preferred.



4. What is the purpose of ANOVA, and how does it differ from a t-test? 





### Purpose of ANOVA:

ANOVA tests whether the means of three or more groups are significantly different from each other. It does this by comparing the variance between groups to the variance within groups.

#### t-Test (Comparison):

--> A t-test compares the means of two groups.

#### Key Difference:

--> ANOVA extends the t-test to handle multiple groups, avoiding the increased error rates associated with multiple t-tests.

5. Explain when and why you would use a one-way ANOVA instead of multiple t-tests when comparing more
than two groups.





1. Efficiency: Conducting multiple t-tests increases the risk of Type I errors (false positives).

2. Error Control: One-way ANOVA controls for this by testing all group means simultaneously.

3. Simplification: One overall test result simplifies interpretation, highlighting if at least one group differs without testing every pair individually.

6. Explain how variance is partitioned in ANOVA into between-group variance and within-group variance.
How does this partitioning contribute to the calculation of the F-statistic?





Total Variance: Divided into two components:

Total Sum of Squares (SST)=SSB(Between-Groups)+SSW (Within-Groups)

Between-Group Variance (SSB): Measures the variance between group means.

𝑆𝑆𝐵=∑𝑛_𝑖(𝑋𝑖−𝑋')**2
 
Within-Group Variance (SSW): Measures the variance within each group.

𝑆𝑆𝑊=∑(𝑋𝑖𝑗−𝑋'𝑖)**2

F-Statistic:

𝐹 = Mean Square Between (MSB)/Mean Square Within (MSW)
​ 
Where:

𝑀𝑆𝐵=𝑆𝑆𝐵/𝑑1,𝑀𝑆𝑊=𝑆𝑆𝑊/𝑑2

7. Compare the classical (frequentist) approach to ANOVA with the Bayesian approach. What are the key
differences in terms of how they handle uncertainty, parameter estimation, and hypothesis testing?

### Classical (Frequentist) ANOVA	

1. Tests null hypothesis with p-values and significance levels.

2. Fixed parameter estimates, uncertainty represented by confidence intervals.

3. Reject or fail to reject 𝐻0.

4. Point estimates (mean, variance).

5. Rigid and depends on large samples.	

6. p-value and F-statistic.

7. Binary decision (reject or fail to reject)

### Bayesian ANOVA

1. Incorporates prior knowledge and updates beliefs.

2. Uncertainty expressed through posterior distributions.

3. Compares posterior probabilities of models.

4. Full posterior distributions for parameters.

5. More flexible, can handle small samples and complex models.

6. Posterior distributions, Bayes Factors.

7. Degree of belief in different hypotheses.

### Key Takeaway:
--> Frequentist ANOVA is straightforward and widely used but rigid in interpretation.

--> Bayesian ANOVA provides richer insights by quantifying uncertainty directly, making it useful for complex or small data scenarios.

8. Question: You have two sets of data representing the incomes of two different professions1

V Profession A: [48, 52, 55, 60, 62]

V Profession B: [45, 50, 55, 52, 47] 

Perform an F-test to determine if the variances of the two professions' incomes are equal. What are your conclusions based on the F-test?

Task: Use Python to calculate the F-statistic and p-value for the given data.

Objective: Gain experience in performing F-tests and interpreting the results in terms of variance comparison.




In [13]:
import scipy.stats as stats
import numpy as np

V_Profession_A = [48, 52, 55, 60, 62]
V_Profession_B = [45, 50, 55, 52, 47]
alpha = 0.05


In [9]:
v1 = np.var(V_Profession_A)
v2 = np.var(V_Profession_B)
v1

26.24

In [10]:
v2

12.559999999999999

In [6]:
df1 = len(V_Profession_A)-1
df2 = len(V_Profession_B)-1
df1

4

In [7]:
df2

4

In [12]:
fstats = v1/v2
fstats

2.089171974522293

In [15]:
p_value = stats.f.ppf(q=1-alpha,dfn = df1,dfd = df2)
p_value

6.388232908695868

In [19]:
if p_value <= fstats:
    print("reject the null hypothesis")
else:
    print("fail to reject null hypothesis")

fail to reject null hypothesis



9. Question: Conduct a one-way ANOVA to test whether there are any statistically significant differences in
average heights between three different regions with the following data1

V Region A: [160, 162, 165, 158, 164]

V Region B: [172, 175, 170, 168, 174]

V Region C: [180, 182, 179, 185, 183]

Task: Write Python code to perform the one-way ANOVA and interpret the results.

Objective: Learn how to perform one-way ANOVA using Python and interpret F-statistic and p-value.

In [20]:
V_Region_A = [160, 162, 165, 158, 164]
V_Region_B = [172, 175, 170, 168, 174]
V_Region_C = [180, 182, 179, 185, 183]

fstats,p_value = stats.f_oneway(V_Region_A,V_Region_B,V_Region_C) 

In [21]:
fstats

67.87330316742101

In [22]:
p_value

2.870664187937026e-07

In [23]:
if p_value <= 0.05:
    print("reject the null hypothesis")
else:
    print("fail to reject null hypothesis")

reject the null hypothesis
