In [None]:
# Q8: Question: You have two sets of data representing the incomes of two different professions:
# Profession A: [48, 52, 55, 60, 62]
# Profession B: [45, 50, 55, 52, 47] Perform an F-test to determine if the variances of the two professions' 
# incomes are equal. What are your conclusions based on the F-test?

# Task: Use Python to calculate the F-statistic and p-value for the given data.

# Objective: Gain experience in performing F-tests and interpreting the results in terms of variance comparison.

In [2]:
# Ans: Here's the Python code to perform the F-test for comparing the variances of incomes between Profession A and Profession B:

# python
import numpy as np
from scipy.stats import f

# Given data
profession_a = np.array([48, 52, 55, 60, 62])
profession_b = np.array([45, 50, 55, 52, 47])

# Step 1: Calculate sample variances
var_a = np.var(profession_a, ddof=1)  # Sample variance of Profession A
var_b = np.var(profession_b, ddof=1)  # Sample variance of Profession B

# Step 2: Calculate the F-statistic (larger variance divided by smaller variance)
if var_a > var_b:
    f_statistic = var_a / var_b
    dfn, dfd = len(profession_a) - 1, len(profession_b) - 1  # Degrees of freedom
else:
    f_statistic = var_b / var_a
    dfn, dfd = len(profession_b) - 1, len(profession_a) - 1  # Degrees of freedom

# Step 3: Calculate the p-value
p_value = 2 * min(f.cdf(f_statistic, dfn, dfd), 1 - f.cdf(f_statistic, dfn, dfd))  # two-tailed p-value

# Display the results
print("F-statistic:", f_statistic)
print("p-value:", p_value)

### Explanation:
# - **`np.var(data, ddof=1)`**: Calculates the sample variance (using `ddof=1` for an unbiased estimate).
# - **F-statistic**: Calculated as the ratio of the larger variance to the smaller variance.
# - **p-value**: A two-tailed p-value is computed by doubling the smaller of `f.cdf` values, which gives the probability of observing an F-statistic as extreme as the calculated one under the null hypothesis.

# This code will output the F-statistic and p-value, allowing you to interpret whether there is a significant difference in variances.

F-statistic: 2.089171974522293
p-value: 0.49304859900533904


In [None]:
# Q 9: Question: Conduct a one-way ANOVA to test whether there are any statistically significant differences in 
# average heights between three different regions with the following data1
# Region A: [160, 162, 165, 158, 164'
# Region B: [172, 175, 170, 168, 174'
# Region C: [180, 182, 179, 185, 183'
# Task: Write Python code to perform the one-way ANOVA and interpret the results
# Objective: Learn how to perform one-way ANOVA using Python and interpret F-statistic and p-value

In [3]:
# Ans: To conduct a one-way ANOVA to test for statistically significant differences in average heights between the three different regions, we can use 
# Python's `scipy.stats` library. Here's how to set up the data and perform the one-way ANOVA:

### Steps to Perform One-Way ANOVA:
# 1. **Prepare the Data**: Organize the height data for each region into separate arrays.
# 2. **Perform One-Way ANOVA**: Use the `f_oneway` function from the `scipy.stats` library.
# 3. **Interpret the Results**: Examine the F-statistic and the p-value to determine if there are significant differences among the group means.

### Given Data:
# **Region A**: [160, 162, 165, 158, 164]
# **Region B**: [172, 175, 170, 168, 174]
# **Region C**: [180, 182, 179, 185, 183]

### Python Code:
# Here's the Python code to perform the one-way ANOVA and interpret the results:

# python
import numpy as np
from scipy.stats import f_oneway

# Given data for heights in different regions
region_a = np.array([160, 162, 165, 158, 164])
region_b = np.array([172, 175, 170, 168, 174])
region_c = np.array([180, 182, 179, 185, 183])

# Step 1: Perform one-way ANOVA
f_statistic, p_value = f_oneway(region_a, region_b, region_c)

# Step 2: Display the results
print("F-statistic:", f_statistic)
print("p-value:", p_value)

# Step 3: Interpret the results
alpha = 0.05  # significance level
if p_value < alpha:
    print("Reject the null hypothesis: There are significant differences in average heights among the regions.")
else:
    print("Fail to reject the null hypothesis: No significant differences in average heights among the regions.")

### Interpretation of the Results:
# - **F-statistic**: This value indicates the ratio of between-group variance to within-group variance. A larger F-statistic suggests that there are 
# greater differences among group means relative to the variation within each group.
# - **p-value**: This value indicates the probability of observing the data given that the null hypothesis (no differences in means) is true. If the 
# p-value is less than the significance level (typically 0.05), we reject the null hypothesis, suggesting that there are significant differences among 
# the means.

### Running the Code:
# Once you run the code, it will provide you with the F-statistic and p-value, along with an interpretation of whether the differences in average 
# heights across the three regions are statistically significant.

F-statistic: 67.87330316742101
p-value: 2.870664187937026e-07
Reject the null hypothesis: There are significant differences in average heights among the regions.
