In [None]:
# Q.1 explain the properties of the F-distribution.

'''

The F-distribution is a continuous probability distribution commonly used in statistical tests, such as the ANOVA and F-test,
for comparing variances. It arises as the ratio of two independent chi-squared distributions divided by their respective degrees of freedom.
Here are its key properties.


~ Hypothesis Testing ~
The F-distribution is used for:-

ANOVA: Testing equality of group means by comparing variance within groups to variance between groups.

Variance Comparison: Testing if two populations have equal variances.

Regression: Assessing the overall significance of a regression model.


~ Tail Probabilities ~
The F-distribution is used in right-tailed tests because variance ratios cannot be negative.



 # These properties make the F-distribution a fundamental tool in inferential statistics,
particularly for comparing variances and assessing model fits. #

In [None]:
# Q.2 In which types of statistical tests is the F-distribution used, and why is it appropriate for these tests?

'''
The F-distribution is used in a variety of statistical tests where comparing variances or testing ratios of variances is essential.
Here are the main types of statistical tests that utilize the F-distribution and the reasons why it is appropriate for these tests.


~ Analysis of Variance (ANOVA) ~

Purpose:
To compare the means of two or more groups by analyzing the ratio of between-group variance to within-group variance.


Appropriateness:
The F-distribution is ideal for this test because the test statistic is a ratio of two variances.
A large F-statistic suggests significant differences between group means relative to within-group variability.



~ F-Test for Equality of Variances ~

Purpose:
To compare the variances of two populations to determine if they are equal.

Appropriateness:
The F-distribution is derived directly from the ratio of variances, making it the natural choice for this test.
Variances are always non-negative, matching the non-negativity of the F-distribution.




Why is the F-Distribution Appropriate?

1. The F-distribution is derived from the ratio of two independent sample variances, which aligns perfectly with the tests that compare variances.
2. The shape of the F-distribution depends on degrees of freedom for the numerator and denominator, allowing flexibility for different sample sizes.
3. The right-skewed nature of the F-distribution enables sensitivity to extreme variance ratios, making it a good fit for hypothesis testing.

'''

In [None]:
# Q.3 What are the key assumptions required for conducting an F-test to compare the variances of two populations?

'''

The F-test for comparing the variances of two populations requires several key assumptions to ensure the validity of the test.
These assumptions are as follows:


1. Independence of Observations
The data samples from each population must be independent of each other.
Each observation within a sample must also be independent of other observations in the same sample.
Violations of this assumption (e.g., paired or correlated samples) can lead to invalid conclusions.

2. Normality of Populations
The populations from which the samples are drawn must follow a normal distribution.
This assumption is critical because the F-distribution is derived based on the normality of the underlying data.
If normality is violated, the F-test may produce inaccurate results, especially for small sample sizes.

3. Random Sampling
The samples must be randomly selected from the populations of interest.
Random sampling ensures that the results are representative and unbiased.

4. Non-Negativity of Variances
Variances cannot be negative, as they represent squared deviations.
The F-test inherently assumes that all variances are non-negative


'''







In [None]:
# Q.4 What is the purpose of ANOVA, and how does it differ from a t-test?

'''

~ Purpose of ANOVA ~

The primary purpose of Analysis of Variance (ANOVA) is to test whether there are statistically significant differences in the means of three
or more groups or categories. It examines the variation between groups compared to the variation within groups to determine if the group means
are different beyond what could be expected due to random chance.



Key Differences

* Number of Groups: *
---------------------
The t-test is limited to comparing two group means.

ANOVA handles three or more groups simultaneously, making it more versatile for multi-group comparisons.

* Type I Error Control: *
-------------------------
Conducting multiple t-tests increases the likelihood of a Type I error (incorrectly rejecting the null hypothesis).

ANOVA mitigates this by testing all groups at once under a single null hypothesis.

* b Post-Hoc Testing: *
-----------------------
ANOVA tells you if there is a significant difference among groups, but it does not specify which groups differ. Post-hoc tests (e.g., Tukey's HSD) are required for pairwise comparisons.

In contrast, a t-test directly compares the two groups in question.

* Underlying Test Statistic: *
-----------------------------

The t-test uses the t-statistic derived from the difference between means.

ANOVA uses the F-statistic derived from the ratio of variances.


'''



In [None]:
# Q.5 Explain when and why you would use a one-way ANOVA instead of multiple t-tests when comparing more than two groups.

'''

You would use a one-way ANOVA instead of multiple t-tests when comparing more than two groups for the following reasons:

1. Efficiency and Simplicity

Problem with Multiple t-Tests:
      Conducting and interpreting multiple t-tests is time-consuming and inefficient, especially as the number of groups increases.

Solution with ANOVA:
      ANOVA provides a single test to determine if there are any significant differences among the group means, simplifying the analysis.


2. Provides a Comprehensive Comparison

Limitations of t-Tests:
      A t-test compares only two means at a time. While this can identify pairwise differences, it doesn’t provide a holistic view of
       whether differences exist across all groups.
Advantage of ANOVA:
      ANOVA considers the variability across all groups simultaneously, making it a better choice for testing the overall
       null hypothesis that all group means are equal.

3. Ensures Validity of Results

Problem with Multiple Testing:
      The assumptions of independence and normality may be harder to justify across multiple pairwise t-tests.

ANOVA as a Solution:
      ANOVA accounts for all data points together, making it more robust and valid for comparing multiple groups.


When to Use a One-Way ANOVA..?

*. When comparing the means of three or more groups.
*. When the groups are independent.
*. When the data is approximately normally distributed, and the variances are equal (homogeneity of variances).


'''


In [None]:
# Q.6  Explain how variance is partitioned in ANOVA into between-group variance and within-group variance. How does this partitioning contribute to the calculation of the F-statistic?


'''

In ANOVA, the total variance in the dataset is partitioned into two components: between-group variance and within-group variance.
This partitioning allows us to determine whether the differences between group means are statistically significant.


Between-group variance : Captures how far the group means are from the overall mean.
Within-group variance : Captures how far individual observations are from their group mean

The F-statistic uses these two variances to test whether the between-group variance is significantly larger than the within-group variance,
indicating that at least one group mean differs from the others.

'''


In [None]:
 # Q.7 Compare the classical (frequentist) approach to ANOVA with the Bayesian approach. What are the key differences in terms of how they handle uncertainty, parameter estimation, and hypothesis testing?




In [1]:
# Q.8 Question: You have two sets of data representing the incomes of two different professions

 # Task: Use Python to calculate the F-statistic and p-value for the given data



import numpy as np
from scipy.stats import f

# Given Data for Profession A and B
profession_a = np.array([48, 52, 55, 60, 62])
profession_b = np.array([45, 50, 55, 52, 47])

# Calculate variances
var_a = np.var(profession_a, ddof=1)  # Sample variance (ddof=1 for unbiased estimate)
var_b = np.var(profession_b, ddof=1)

# Calculate F-statistic
f_statistic = var_a / var_b

# Degrees of freedom
df1 = len(profession_a) - 1
df2 = len(profession_b) - 1

# Calculate p-value
p_value = f.sf(f_statistic, df1, df2)  # Upper tail probability

f_statistic, p_value


(2.089171974522293, 0.24652429950266966)

In [None]:
'''
# Null Hypothesis : The variances of incomes for Profession A and Profession B are equal.

# Alternative Hypothesis :The variances of incomes for Profession A and Profession B are not equal.


The calculated F-statistic is approximately 2.089, and the p-value is approximately 0.247.

The p-value (0.247) is greater than the common significance level (α=0.05).

Therefore, we fail to reject the null hypothesis.


'''

In [2]:
#Q.9  Question: Conduct a one-way ANOVA to test whether there are any statistically significant differences in average heights between three different regions with the following data

#  Task: Write Python code to perform the one-way ANOVA and interpret the results


from scipy.stats import f_oneway

# Given Data for three regions
region_a = [160, 162, 165, 158, 164]
region_b = [172, 175, 170, 168, 174]
region_c = [180, 182, 179, 185, 183]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(region_a, region_b, region_c)

f_statistic, p_value


(67.87330316742101, 2.870664187937026e-07)

In [None]:
'''

# Null Hypothesis : The average heights across the three regions are the same.

# Alternative Hypothesis : At least one region has a significantly different average height.



The p-value is extremely small (2.87), which is much smaller than the common significance level (α=0.05).

Therefore we reject the Null Hypothesis

'''