#**Assignment** 7
## This is the assignment in week 12 Statistics -Statistics advance 02


## Q1. Write a Python function that takes in two arrays of data and calculates the F-value for a variance ratio test. The function should return the F-value and the corresponding p-value for the test.

You can calculate the F-value and corresponding p-value for a variance ratio test (ANOVA) using Python's `scipy.stats` library. Here's a Python function that takes two arrays of data and performs the variance ratio test:

```python
import scipy.stats as stats

def variance_ratio_test(data1, data2):
    """
    Perform a variance ratio test (ANOVA) on two arrays of data.

    Parameters:
    - data1: First array of data
    - data2: Second array of data

    Returns:
    - f_value: The F-value from the variance ratio test
    - p_value: The corresponding p-value
    """

    # Perform the variance ratio test (ANOVA)
    f_value, p_value = stats.f_oneway(data1, data2)

    return f_value, p_value

# Example usage:
data1 = [23, 27, 30, 21, 25]
data2 = [18, 20, 24, 17, 22]

f_value, p_value = variance_ratio_test(data1, data2)
print(f'F-value: {f_value}')
print(f'p-value: {p_value}')
```

In this code:

- The `variance_ratio_test` function takes two arrays of data, `data1` and `data2`, as input.
- It uses `stats.f_oneway` to perform the variance ratio test (ANOVA) and calculates the F-value and p-value.
- The function returns the F-value and p-value as a tuple.

You can use this function by providing your own datasets in `data1` and `data2`, and it will return the F-value and p-value for the variance ratio test.

## Q2. Given a significance level of 0.05 and the degrees of freedom for the numerator and denominator of an F-distribution, write a Python function that returns the critical F-value for a two-tailed test.

To calculate the critical F-value for a two-tailed test with a given significance level (\(\alpha\)) and degrees of freedom for the numerator and denominator of an F-distribution, you can use the `scipy.stats` library in Python. Here's a Python function to do that:

```python
import scipy.stats as stats

def critical_f_value(alpha, dfn, dfd):
    """
    Calculate the critical F-value for a two-tailed test.

    Parameters:
    - alpha: Significance level (e.g., 0.05 for a 95% confidence level)
    - dfn: Degrees of freedom for the numerator
    - dfd: Degrees of freedom for the denominator

    Returns:
    - critical_f: The critical F-value
    """

    # Calculate the critical F-value
    critical_f = stats.f.ppf(1 - alpha/2, dfn, dfd)

    return critical_f

# Example usage:
alpha = 0.05  # Significance level (e.g., 0.05 for a 95% confidence level)
dfn = 3       # Degrees of freedom for the numerator
dfd = 20      # Degrees of freedom for the denominator

critical_f = critical_f_value(alpha, dfn, dfd)
print(f'Critical F-value: {critical_f}')
```

In this code:

- The `critical_f_value` function takes three parameters: `alpha` (significance level), `dfn` (degrees of freedom for the numerator), and `dfd` (degrees of freedom for the denominator).
- It uses `stats.f.ppf` to calculate the critical F-value for a two-tailed test. The `1 - alpha/2` argument corresponds to the upper tail of the F-distribution.
- The function returns the critical F-value.

You can use this function by providing your own significance level, degrees of freedom for the numerator, and degrees of freedom for the denominator, and it will return the critical F-value for a two-tailed test.

## Q3. Write a Python program that generates random samples from two normal distributions with known variances and uses an F-test to determine if the variances are equal. The program should output the F- value, degrees of freedom, and p-value for the test.

You can use Python with the `numpy` and `scipy.stats` libraries to generate random samples from two normal distributions with known variances and perform an F-test to determine if the variances are equal. Here's a Python program to do that:

```python
import numpy as np
import scipy.stats as stats

# Set the random seed for reproducibility
np.random.seed(0)

# Generate random samples from two normal distributions with known variances
variance1 = 4.0  # Variance of the first distribution
variance2 = 9.0  # Variance of the second distribution
sample_size1 = 30
sample_size2 = 30

sample1 = np.random.normal(loc=0, scale=np.sqrt(variance1), size=sample_size1)
sample2 = np.random.normal(loc=0, scale=np.sqrt(variance2), size=sample_size2)

# Perform an F-test to compare variances
f_statistic = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
dfn = sample_size1 - 1
dfd = sample_size2 - 1
p_value = 2 * min(stats.f.cdf(f_statistic, dfn, dfd), 1 - stats.f.cdf(f_statistic, dfn, dfd))

# Output the results
print(f'F-statistic: {f_statistic:.4f}')
print(f'Degrees of freedom (numerator): {dfn}')
print(f'Degrees of freedom (denominator): {dfd}')
print(f'p-value: {p_value:.4f}')

# Test the null hypothesis (H0: Variances are equal) based on the p-value and chosen significance level
alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis: Variances are not equal.")
else:
    print("Fail to reject the null hypothesis: Variances are equal.")
```

In this code:

- We generate random samples from two normal distributions with known variances (`variance1` and `variance2`) using `np.random.normal`.
- We calculate the F-statistic by comparing the sample variances and degrees of freedom for both samples.
- We calculate the p-value for the F-test based on the F-statistic, degrees of freedom, and the cumulative distribution function of the F-distribution.
- We output the F-statistic, degrees of freedom for the numerator and denominator, and the p-value.
- Finally, we test the null hypothesis based on the p-value and chosen significance level. If the p-value is less than the significance level, we reject the null hypothesis, indicating that the variances are not equal; otherwise, we fail to reject the null hypothesis, suggesting that the variances are equal.

## Q4.The variances of two populations are known to be 10 and 15. A sample of 12 observations is taken from each population. Conduct an F-test at the 5% significance level to determine if the variances are significantly different.

To conduct an F-test at the 5% significance level to determine if the variances of two populations are significantly different, you can follow these steps:

1. Set up the null and alternative hypotheses:
   - Null Hypothesis (\(H_0\)): The variances are equal (\(\sigma_1^2 = \sigma_2^2\)).
   - Alternative Hypothesis (\(H_1\)): The variances are not equal (\(\sigma_1^2 \neq \sigma_2^2\)).

2. Determine the significance level (\(\alpha\)) for the test. In this case, it's 0.05.

3. Calculate the F-statistic using the formula:

   \[
   F = \frac{{\text{Variance of Population 1}}}{{\text{Variance of Population 2}}}
   \]

4. Determine the degrees of freedom for both populations:
   - Degrees of Freedom for Population 1 (\(df_1\)) = Sample Size for Population 1 - 1 = 12 - 1 = 11
   - Degrees of Freedom for Population 2 (\(df_2\)) = Sample Size for Population 2 - 1 = 12 - 1 = 11

5. Use the F-distribution tables or a statistical calculator to find the critical F-value for a two-tailed test at the 5% significance level with \(df_1\) and \(df_2\) degrees of freedom.

6. Compare the calculated F-statistic to the critical F-value:
   - If the calculated F-statistic is greater than the critical F-value or less than its reciprocal (due to the two-tailed nature of the test), reject the null hypothesis (\(H_0\)).
   - If the calculated F-statistic falls within the critical region, it suggests that the variances are significantly different.

Let's calculate the F-statistic and perform the F-test:

```python
# Given variances
variance1 = 10
variance2 = 15

# Sample sizes
sample_size1 = 12
sample_size2 = 12

# Calculate the F-statistic
f_statistic = variance1 / variance2

# Degrees of freedom
df1 = sample_size1 - 1
df2 = sample_size2 - 1

# Significance level
alpha = 0.05

# Find the critical F-value for a two-tailed test
critical_f_value = stats.f.ppf(1 - alpha/2, df1, df2)

# Perform the F-test
if f_statistic > critical_f_value or f_statistic < 1/critical_f_value:
    print("Reject the null hypothesis: The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: The variances are not significantly different.")
```

In this code:

- We calculate the F-statistic by dividing the variance of Population 1 by the variance of Population 2.
- We determine the degrees of freedom for both populations and the significance level (\(\alpha\)).
- We find the critical F-value for a two-tailed test at the 5% significance level.
- We compare the calculated F-statistic to the critical F-value and make a decision regarding the null hypothesis.

If the calculated F-statistic is outside the critical region, we reject the null hypothesis, indicating that the variances are significantly different. If it falls within the critical region, we fail to reject the null hypothesis, suggesting that the variances are not significantly different.

## Q5. A manufacturer claims that the variance of the diameter of a certain product is 0.005. A sample of 25 products is taken, and the sample variance is found to be 0.006. Conduct an F-test at the 1% significance level to determine if the claim is justified.

To determine if the manufacturer's claim about the variance of the product's diameter is justified, we can conduct an F-test at the 1% significance level. Here are the steps:

1. Set up the null and alternative hypotheses:
   - Null Hypothesis (\(H_0\)): The manufacturer's claim is justified, i.e., the population variance is 0.005 (\(\sigma^2 = 0.005\)).
   - Alternative Hypothesis (\(H_1\)): The manufacturer's claim is not justified, i.e., the population variance is not 0.005 (\(\sigma^2 \neq 0.005\)).

2. Determine the significance level (\(\alpha\)) for the test. In this case, it's 0.01 (1% significance level).

3. Calculate the F-statistic using the formula:

   \[
   F = \frac{{\text{Sample Variance}}}{{\text{Claimed Population Variance}}}
   \]

4. Determine the degrees of freedom for both populations:
   - Degrees of Freedom for the Sample (\(df_1\)) = Sample Size - 1 = 25 - 1 = 24
   - Degrees of Freedom for the Population (\(df_2\)) = Large number (to approximate the population variance)

5. Use the F-distribution tables or a statistical calculator to find the critical F-value for a two-tailed test at the 1% significance level with \(df_1\) and a large number (e.g., 1000) degrees of freedom for the population.

6. Compare the calculated F-statistic to the critical F-value:
   - If the calculated F-statistic is greater than the critical F-value or less than its reciprocal (due to the two-tailed nature of the test), reject the null hypothesis (\(H_0\)).
   - If the calculated F-statistic falls within the critical region, it suggests that the manufacturer's claim is not justified.

Let's calculate the F-statistic and perform the F-test:

```python
import scipy.stats as stats

# Given values
claimed_variance = 0.005
sample_size = 25
sample_variance = 0.006

# Calculate the F-statistic
f_statistic = sample_variance / claimed_variance

# Degrees of freedom
df1 = sample_size - 1
df2 = 1000  # A large number to approximate the population variance

# Significance level
alpha = 0.01

# Find the critical F-value for a two-tailed test
critical_f_value = stats.f.ppf(1 - alpha/2, df1, df2)

# Perform the F-test
if f_statistic > critical_f_value or f_statistic < 1/critical_f_value:
    print("Reject the null hypothesis: The manufacturer's claim is not justified.")
else:
    print("Fail to reject the null hypothesis: The manufacturer's claim is justified.")
```

In this code:

- We calculate the F-statistic by dividing the sample variance by the claimed population variance.
- We determine the degrees of freedom for both the sample and the population and the significance level (\(\alpha\)).
- We find the critical F-value for a two-tailed test at the 1% significance level.
- We compare the calculated F-statistic to the critical F-value and make a decision regarding the manufacturer's claim.

If the calculated F-statistic is outside the critical region, we reject the null hypothesis, indicating that the manufacturer's claim is not justified. If it falls within the critical region, we fail to reject the null hypothesis, suggesting that the claim is justified.

## Q6. Write a Python function that takes in the degrees of freedom for the numerator and denominator of an F-distribution and calculates the mean and variance of the distribution. The function should return the mean and variance as a tuple.

To calculate the mean and variance of an F-distribution given the degrees of freedom for the numerator (\(df_1\)) and denominator (\(df_2\)), you can use the formulas for the mean and variance of the F-distribution. Here's a Python function to do that:

```python
def f_distribution_mean_and_variance(df1, df2):
    """
    Calculate the mean and variance of an F-distribution.

    Parameters:
    - df1: Degrees of freedom for the numerator
    - df2: Degrees of freedom for the denominator

    Returns:
    - mean: Mean of the F-distribution
    - variance: Variance of the F-distribution
    """

    # Mean of the F-distribution
    mean = df2 / (df2 - 2) if df2 > 2 else None

    # Variance of the F-distribution
    if df2 > 4:
        variance = (2 * (df2 ** 2) * (df1 + df2 - 2)) / (df1 * (df2 - 2) ** 2 * (df2 - 4))
    else:
        variance = None

    return mean, variance

# Example usage:
df1 = 5
df2 = 10

mean, variance = f_distribution_mean_and_variance(df1, df2)
print(f'Mean of the F-distribution: {mean}')
print(f'Variance of the F-distribution: {variance}')
```

In this code:

- The `f_distribution_mean_and_variance` function takes two parameters, `df1` and `df2`, which are the degrees of freedom for the numerator and denominator of the F-distribution, respectively.
- It calculates the mean and variance of the F-distribution using the provided formulas.
- The mean and variance are returned as a tuple.

Note that the calculations depend on the values of \(df_1\) and \(df_2\), and there are some special cases to consider, such as when \(df_2\) is less than or equal to 2 or when \(df_2\) is less than or equal to 4. These special cases are handled in the function.

## Q7. A random sample of 10 measurements is taken from a normal population with unknown variance. The sample variance is found to be 25. Another random sample of 15 measurements is taken from another normal population with unknown variance, and the sample variance is found to be 20. Conduct an F-test at the 10% significance level to determine if the variances are significantly different.

To determine if the variances of two populations are significantly different, you can conduct an F-test at the 10% significance level. Here are the steps:

1. Set up the null and alternative hypotheses:
   - Null Hypothesis (\(H_0\)): The variances of the two populations are equal (\(\sigma_1^2 = \sigma_2^2\)).
   - Alternative Hypothesis (\(H_1\)): The variances of the two populations are not equal (\(\sigma_1^2 \neq \sigma_2^2\)).

2. Determine the significance level (\(\alpha\)) for the test. In this case, it's 0.10 (10% significance level).

3. Calculate the F-statistic using the formula:

   \[
   F = \frac{{\text{Sample Variance of Population 1}}}{{\text{Sample Variance of Population 2}}}
   \]

4. Determine the degrees of freedom for both populations:
   - Degrees of Freedom for Sample 1 (\(df_1\)) = Sample Size 1 - 1 = 10 - 1 = 9
   - Degrees of Freedom for Sample 2 (\(df_2\)) = Sample Size 2 - 1 = 15 - 1 = 14

5. Use the F-distribution tables or a statistical calculator to find the critical F-value for a two-tailed test at the 10% significance level with \(df_1\) and \(df_2\) degrees of freedom.

6. Compare the calculated F-statistic to the critical F-value:
   - If the calculated F-statistic is greater than the critical F-value or less than its reciprocal (due to the two-tailed nature of the test), reject the null hypothesis (\(H_0\)).
   - If the calculated F-statistic falls within the critical region, it suggests that the variances are not significantly different.

Let's calculate the F-statistic and perform the F-test:

```python
import scipy.stats as stats

# Given sample variances
sample_variance1 = 25.0
sample_variance2 = 20.0

# Sample sizes
sample_size1 = 10
sample_size2 = 15

# Calculate the F-statistic
f_statistic = sample_variance1 / sample_variance2

# Degrees of freedom
df1 = sample_size1 - 1
df2 = sample_size2 - 1

# Significance level
alpha = 0.10

# Find the critical F-value for a two-tailed test
critical_f_value = stats.f.ppf(1 - alpha/2, df1, df2)

# Perform the F-test
if f_statistic > critical_f_value or f_statistic < 1/critical_f_value:
    print("Reject the null hypothesis: The variances are not significantly different.")
else:
    print("Fail to reject the null hypothesis: The variances are significantly different.")
```

In this code:

- We calculate the F-statistic by dividing the sample variance of Population 1 by the sample variance of Population 2.
- We determine the degrees of freedom for both samples and the significance level (\(\alpha\)).
- We find the critical F-value for a two-tailed test at the 10% significance level.
- We compare the calculated F-statistic to the critical F-value and make a decision regarding the null hypothesis.

If the calculated F-statistic is outside the critical region, we reject the null hypothesis, indicating that the variances are not significantly different. If it falls within the critical region, we fail to reject the null hypothesis, suggesting that the variances are significantly different.

## Q8. The following data represent the waiting times in minutes at two different restaurants on a Saturday night: Restaurant A: 24, 25, 28, 23, 22, 20, 27; Restaurant B: 31, 33, 35, 30, 32, 36. Conduct an F-test at the 5% significance level to determine if the variances are significantly different.

To determine if the variances of the waiting times at Restaurant A and Restaurant B are significantly different, you can conduct an F-test at the 5% significance level. Here are the steps:

1. Set up the null and alternative hypotheses:
   - Null Hypothesis (\(H_0\)): The variances of the waiting times at both restaurants are equal (\(\sigma_A^2 = \sigma_B^2\)).
   - Alternative Hypothesis (\(H_1\)): The variances of the waiting times at the two restaurants are not equal (\(\sigma_A^2 \neq \sigma_B^2\)).

2. Determine the significance level (\(\alpha\)) for the test. In this case, it's 0.05 (5% significance level).

3. Calculate the F-statistic using the formula:

   \[
   F = \frac{{\text{Sample Variance of Restaurant A}}}{{\text{Sample Variance of Restaurant B}}}
   \]

4. Determine the degrees of freedom for both samples:
   - Degrees of Freedom for Sample A (\(df_1\)) = Sample Size for Restaurant A - 1 = 7 - 1 = 6
   - Degrees of Freedom for Sample B (\(df_2\)) = Sample Size for Restaurant B - 1 = 6 - 1 = 5

5. Use the F-distribution tables or a statistical calculator to find the critical F-value for a two-tailed test at the 5% significance level with \(df_1\) and \(df_2\) degrees of freedom.

6. Compare the calculated F-statistic to the critical F-value:
   - If the calculated F-statistic is greater than the critical F-value or less than its reciprocal (due to the two-tailed nature of the test), reject the null hypothesis (\(H_0\)).
   - If the calculated F-statistic falls within the critical region, it suggests that the variances are not significantly different.

Let's calculate the F-statistic and perform the F-test:

```python
import scipy.stats as stats

# Waiting times data
waiting_times_a = [24, 25, 28, 23, 22, 20, 27]
waiting_times_b = [31, 33, 35, 30, 32, 36]

# Calculate the sample variances
sample_variance_a = sum([(x - sum(waiting_times_a) / len(waiting_times_a))**2 for x in waiting_times_a]) / (len(waiting_times_a) - 1)
sample_variance_b = sum([(x - sum(waiting_times_b) / len(waiting_times_b))**2 for x in waiting_times_b]) / (len(waiting_times_b) - 1)

# Calculate the F-statistic
f_statistic = sample_variance_a / sample_variance_b

# Degrees of freedom
df1 = len(waiting_times_a) - 1
df2 = len(waiting_times_b) - 1

# Significance level
alpha = 0.05

# Find the critical F-value for a two-tailed test
critical_f_value = stats.f.ppf(1 - alpha/2, df1, df2)

# Perform the F-test
if f_statistic > critical_f_value or f_statistic < 1/critical_f_value:
    print("Reject the null hypothesis: The variances are not significantly different.")
else:
    print("Fail to reject the null hypothesis: The variances are significantly different.")
```

In this code:

- We calculate the sample variances for both restaurants.
- We determine the degrees of freedom for both samples and the significance level (\(\alpha\)).
- We find the critical F-value for a two-tailed test at the 5% significance level.
- We compare the calculated F-statistic to the critical F-value and make a decision regarding the null hypothesis.

If the calculated F-statistic is outside the critical region, we reject the null hypothesis, indicating that the variances are not significantly different. If it falls within the critical region, we fail to reject the null hypothesis, suggesting that the variances are significantly different.

## Q9. The following data represent the test scores of two groups of students: Group A: 80, 85, 90, 92, 87, 83; Group B: 75, 78, 82, 79, 81, 84. Conduct an F-test at the 1% significance level to determine if the variances are significantly different.

To determine if the variances of the test scores of Group A and Group B are significantly different, you can conduct an F-test at the 1% significance level. Here are the steps:

1. Set up the null and alternative hypotheses:
   - Null Hypothesis (\(H_0\)): The variances of the test scores in both groups are equal (\(\sigma_A^2 = \sigma_B^2\)).
   - Alternative Hypothesis (\(H_1\)): The variances of the test scores in the two groups are not equal (\(\sigma_A^2 \neq \sigma_B^2\)).

2. Determine the significance level (\(\alpha\)) for the test. In this case, it's 0.01 (1% significance level).

3. Calculate the F-statistic using the formula:

   \[
   F = \frac{{\text{Sample Variance of Group A}}}{{\text{Sample Variance of Group B}}}
   \]

4. Determine the degrees of freedom for both samples:
   - Degrees of Freedom for Group A (\(df_1\)) = Sample Size for Group A - 1 = 6 - 1 = 5
   - Degrees of Freedom for Group B (\(df_2\)) = Sample Size for Group B - 1 = 6 - 1 = 5

5. Use the F-distribution tables or a statistical calculator to find the critical F-value for a two-tailed test at the 1% significance level with \(df_1\) and \(df_2\) degrees of freedom.

6. Compare the calculated F-statistic to the critical F-value:
   - If the calculated F-statistic is greater than the critical F-value or less than its reciprocal (due to the two-tailed nature of the test), reject the null hypothesis (\(H_0\)).
   - If the calculated F-statistic falls within the critical region, it suggests that the variances are not significantly different.

Let's calculate the F-statistic and perform the F-test:

```python
import scipy.stats as stats

# Test scores data
scores_group_a = [80, 85, 90, 92, 87, 83]
scores_group_b = [75, 78, 82, 79, 81, 84]

# Calculate the sample variances
sample_variance_group_a = sum([(x - sum(scores_group_a) / len(scores_group_a))**2 for x in scores_group_a]) / (len(scores_group_a) - 1)
sample_variance_group_b = sum([(x - sum(scores_group_b) / len(scores_group_b))**2 for x in scores_group_b]) / (len(scores_group_b) - 1)

# Calculate the F-statistic
f_statistic = sample_variance_group_a / sample_variance_group_b

# Degrees of freedom
df1 = len(scores_group_a) - 1
df2 = len(scores_group_b) - 1

# Significance level
alpha = 0.01

# Find the critical F-value for a two-tailed test
critical_f_value = stats.f.ppf(1 - alpha/2, df1, df2)

# Perform the F-test
if f_statistic > critical_f_value or f_statistic < 1/critical_f_value:
    print("Reject the null hypothesis: The variances are not significantly different.")
else:
    print("Fail to reject the null hypothesis: The variances are significantly different.")
```

In this code:

- We calculate the sample variances for both groups.
- We determine the degrees of freedom for both samples and the significance level (\(\alpha\)).
- We find the critical F-value for a two-tailed test at the 1% significance level.
- We compare the calculated F-statistic to the critical F-value and make a decision regarding the null hypothesis.

If the calculated F-statistic is outside the critical region, we reject the null hypothesis, indicating that the variances are not significantly different. If it falls within the critical region, we fail to reject the null hypothesis, suggesting that the variances are significantly different.