In [None]:
#1- Explain the properties of the F-distribution.
#The F-distribution is a probability distribution used primarily to analyze variances and is often encountered in ANOVA (Analysis of Variance) testing.
# Here are three simple key properties:

#Shape: The F-distribution is always right-skewed, meaning that it is asymmetrical and has a tail that extends to the right.
#The shape of the distribution depends on two parameters—degrees of freedom for the numerator and the denominator—which affect the spread and the peak of the distribution.

#Range: The values of the F-distribution are always positive, ranging from 0 to infinity.
#This is because it represents a ratio of variances, and variances cannot be negative.

#Usage in hypothesis testing: The F-distribution is used to compare two variances and determine whether they are significantly different.
#This is crucial in testing hypotheses about group means in experiments, where the variance between group means is compared to the variance within groups.

In [None]:
#2-2. In which types of statistical tests is the F-distribution used, and why is it appropriate for these tests.
#The F-distribution is used in the following types of statistical tests:

#ANOVA (Analysis of Variance): It's used in ANOVA to compare means of three or more groups by analyzing the ratio of variance between groups to variance within groups.
#The F-distribution is appropriate here because it effectively measures how much the group means deviate from the overall mean,
# helping to determine if any of the group differences are statistically significant.

#Regression Analysis: In regression, the F-test assesses the overall significance of the regression model.
#It compares the variance explained by the model to the variance of the residuals (errors).
#The F-distribution is suitable because it helps to determine if the model provides a better fit to the data than a model with no independent variables.

In [None]:

#3- What are the key assumptions required for conducting an F-test to compare the variances of two
#populations
#When conducting an F-test to compare the variances of two populations, the key assumptions are:

#Normality: Both populations from which the samples are drawn must be normally distributed.
#The F-test is sensitive to deviations from normality.

#Independence: The samples must be independent of each other.
#This means that the data collected from one sample should not influence the data collected from another.

#Ratio of Variances: The F-test typically assumes that the larger variance is in the numerator and the smaller in the denominator to avoid inflating the F-value,
#which can affect the test's accuracy and the interpretation of results.

In [None]:
#4-What is the purpose of ANOVA, and how does it differ from a t-test
#The purpose of ANOVA (Analysis of Variance) and how it differs from a t-test can be summarized in a few key points:

#Purpose of ANOVA: ANOVA is used to determine if there are any statistically significant differences between the means of three or more independent (unrelated) groups.
# It helps in testing general rather than specific differences.

#Multiple Groups: Unlike the t-test, which is typically used to compare the means between two groups,
#ANOVA can handle multiple groups at once, making it ideal for experiments involving more than two treatments or conditions.

#Overall Variance: ANOVA decomposes the variances within and between groups to see if the mean differences among groups are greater than would be expected by chance.
#It tests one or more factors by comparing the response variable means at the different factor levels.

#Assumptions: Both tests assume that the data comes from a normally distributed population and that the data have homogeneity of variances (equal variances across samples).
#However, ANOVA is generally more robust to violations of these assumptions with larger sample sizes.

#These distinctions make ANOVA a versatile tool for more complex experimental designs compared to the t-test, which is more limited in scope.

In [None]:
#5- Explain when and why you would use a one-way ANOVA instead of multiple t-tests when comparing more
#than two groups

#You would use a one-way ANOVA instead of multiple t-tests in certain situations for these main reasons:

#Risk of Type I Errors: When comparing more than two groups,
#conducting multiple t-tests increases the chance of making a Type I error—that is,
#incorrectly rejecting the null hypothesis. ANOVA analyzes the differences among group means in one go,
#maintaining the error rate at the desired level (usually 0.05).

#Efficiency and Simplicity:
# ANOVA provides a single,
#comprehensive analysis that can test differences across all groups simultaneously,
#which is more efficient and simpler than running and interpreting multiple pairwise t-tests.
#This avoids the complexity and redundancy of multiple individual tests.

In [None]:
#6-Explain how variance is partitioned in ANOVA into between-group variance and within-group variance.
#How does this partitioning contribute to the calculation of the F-statistic.

#-In ANOVA, variance is partitioned into between-group variance and within-group variance as follows:
#Partitioning Variance:
#Between-Group Variance: This measures how much the group means deviate from the overall mean of all groups combined.
# It reflects the variation due to the interaction between different treatments or conditions applied to the groups.

#Within-Group Variance: This measures the variation within each group,
#accounting for the natural variability in the data that is not due to the treatment effects. It essentially captures random noise and individual differences within each group.


#Contribution to F-Statistic:
#The F-statistic in ANOVA is calculated by taking the ratio of the between-group variance to the within-group variance.
# A higher ratio suggests that a significant portion of the overall variance is due to the differences between group means (i.e., due to treatment effects),
# rather than random variation within groups. This ratio is what the F-test uses to determine if the observed differences in means across groups are statistically significant

In [None]:
#7- Compare the classical (frequentist) approach to ANOVA with the Bayesian approach. What are the key
#differences in terms of how they handle uncertainty, parameter estimation, and hypothesis testing

#When we use ANOVA to see if different groups are similar or different, there are two main ways to do it:

#Classical Way (like using a fixed ruler): We use specific rules and a ruler that never changes to measure and decide if the groups are different.
#Bayesian Way (like guessing with clues): We guess what might be true by using clues from what we already know and what we see,
#and then see how likely our guess is right.

#It's like choosing between measuring something with a ruler every time, or making an educated guess with a little help from what you know already

In [None]:
#8-Question: You have two sets of data representing the incomes of two different professions1
# Profession A: [48, 52, 55, 60, 62]
# Profession B: [45, 50, 55, 52, 47]
#Perform an F-test to determine if the variances of the two professions'
#incomes are equal. What are your conclusions based on the F-test?

#Task: Use Python to calculate the F-statistic and p-value for the given data.

#Objective: Gain experience in performing F-tests and interpreting the results in terms of variance comparison

In [None]:
#8-
import numpy as np
import scipy.stats as stats

# Given data
income_A = [48, 52, 55, 60, 62]
income_B = [45, 50, 55, 52, 47]

# Calculate the variances
var_A = np.var(income_A, ddof=1)
var_B = np.var(income_B, ddof=1)

# Perform F-test
F_statistic = var_A / var_B

# Degrees of freedom for each sample
dfn = len(income_A) - 1
dfd = len(income_B) - 1

# Calculate the p-value
p_value = stats.f.cdf(F_statistic, dfn, dfd) if var_A > var_B else stats.f.cdf(1/F_statistic, dfd, dfn)

F_statistic, p_value



(2.089171974522293, 0.7534757004973305)

In [None]:
#The F-statistic calculated from the incomes of the two professions is approximately 2.09, and the p-value is about 0.753.

#Conclusion:

#1-The p-value is much higher than the typical significance level (e.g., 0.05),
#indicating that there is no significant evidence to reject the null hypothesis that the variances of the two professions' incomes are equal.

#2-Based on the F-test, we conclude that the incomes for Profession A and Profession B do not show statistically significant differences in their variances

In [None]:
#9-Question: Conduct a one-way ANOVA to test whether there are any statistically significant differences in
#average heights between three different regions with the following data1
# Region A: [160, 162, 165, 158, 164]
# Region B: [172, 175, 170, 168, 174]
# Region C: [180, 182, 179, 185, 183]
# Task: Write Python code to perform the one-way ANOVA and interpret the results
# Objective: Learn how to perform one-way ANOVA using Python and interpret F-statistic and p-value

In [None]:
#-
import scipy.stats as stats

# Data for the three regions
heights_A = [160, 162, 165, 158, 164]
heights_B = [172, 175, 170, 168, 174]
heights_C = [180, 182, 179, 185, 183]

# Perform one-way ANOVA
F_statistic, p_value = stats.f_oneway(heights_A, heights_B, heights_C)

F_statistic, p_value



(67.87330316742101, 2.870664187937026e-07)

In [None]:
#The one-way ANOVA conducted on the average heights from three different regions resulted in an F-statistic of approximately 67.87
#with a corresponding p-value of about  2.870664187937026e-07

#Interpretation:
#The very small p-value (much less than the common significance level of 0.05) indicates that we can reject the null hypothesis,
# Thus, the results imply that there are statistically significant differences in the average heights across the three regions.
#This means the average height is likely to vary depending on the region.