## Statistics Advance Assignment - 7
***By Shahequa Modabbera***

### Q1. Write a Python function that takes in two arrays of data and calculates the F-value for a variance ratio test. The function should return the F-value and the corresponding p-value for the test.

In [1]:
import scipy.stats as stats
import numpy as np

def variance_ratio_test(data1, data2):
    # Calculate the variances
    var1 = np.var(data1, ddof=1)
    var2 = np.var(data2, ddof=1)

    # Calculate the F-value and p-value
    f_value = var1 / var2
    df1 = len(data1) - 1
    df2 = len(data2) - 1
    p_value = stats.f.sf(f_value, df1, df2)

    return f_value, p_value

In this function:
- `data1` and `data2` are the input arrays of data.
- `np.var` is used to calculate the sample variances with `ddof=1` to account for the degrees of freedom correction.
- `f_value` is calculated as the ratio of the variances (`var1 / var2`).
- `df1` and `df2` represent the degrees of freedom, which are calculated as the length of each data array minus 1.
- `stats.f.sf` is used to calculate the survival function (1 - cumulative distribution function) for the F-distribution, giving the p-value.

You can use this function by passing your data arrays to it, like this:

In [2]:
data1 = [1, 2, 3, 4, 5]
data2 = [2, 4, 6, 8, 10]

f_value, p_value = variance_ratio_test(data1, data2)
print("F-value:", f_value)
print("p-value:", p_value)

F-value: 0.25
p-value: 0.896


### Q2. Given a significance level of 0.05 and the degrees of freedom for the numerator and denominator of an F-distribution, write a Python function that returns the critical F-value for a two-tailed test.

Ans) Python function that calculates the critical F-value for a two-tailed test given a significance level and the degrees of freedom for the numerator and denominator:

You can use this function by calling it with the desired significance level and degrees of freedom, like this:

This will output the critical F-value for the given significance level and degrees of freedom.

In [3]:
import scipy.stats as stats

def critical_f_value(alpha, df_num, df_denom):
    # Calculate the critical F-value
    critical_value = stats.f.ppf(1 - alpha/2, df_num, df_denom)
    
    return critical_value

In this function:
- `alpha` is the significance level, typically set to 0.05 for a 95% confidence level.
- `df_num` represents the degrees of freedom for the numerator (the group or treatment).
- `df_denom` represents the degrees of freedom for the denominator (the error or residual).

The function uses `stats.f.ppf` from the `scipy.stats` module to calculate the percent-point function (inverse of the cumulative distribution function) for the F-distribution. By passing `1 - alpha/2` as the probability argument, we obtain the critical value that cuts off the upper tail area for a two-tailed test.

We can use this function by calling it with the desired significance level and degrees of freedom.

This will output the critical F-value for the given significance level and degrees of freedom.

In [4]:
alpha = 0.05
df_num = 3
df_denom = 20

critical_value = critical_f_value(alpha, df_num, df_denom)
print("Critical F-value:", critical_value)

Critical F-value: 3.8586986662732143


### Q3. Write a Python program that generates random samples from two normal distributions with known variances and uses an F-test to determine if the variances are equal. The program should output the F-value, degrees of freedom, and p-value for the test.

In [5]:
import numpy as np
from scipy.stats import f

# Set random seed for reproducibility
np.random.seed(42)

# Generate random samples from two normal distributions
sample_size_1 = 50
sample_size_2 = 50
mean_1 = 0
mean_2 = 0
variance_1 = 5
variance_2 = 8

sample_1 = np.random.normal(mean_1, np.sqrt(variance_1), sample_size_1)
sample_2 = np.random.normal(mean_2, np.sqrt(variance_2), sample_size_2)

# Calculate the F-value and p-value
f_value = np.var(sample_1, ddof=1) / np.var(sample_2, ddof=1)
df_numerator = sample_size_1 - 1
df_denominator = sample_size_2 - 1
p_value = 1 - f.cdf(f_value, df_numerator, df_denominator)

# Output the results
print('F-value:', f_value)
print('Degrees of Freedom (numerator, denominator):', df_numerator, df_denominator)
print('p-value:', p_value)

F-value: 0.7127215864960785
Degrees of Freedom (numerator, denominator): 49 49
p-value: 0.8803040890319981


### Q4.The variances of two populations are known to be 10 and 15. A sample of 12 observations is taken from each population. Conduct an F-test at the 5% significance level to determine if the variances are significantly different.

In [6]:
from scipy.stats import f

alpha = 0.05

variance_1 = 10
variance_2 = 15
sample_size_1 = 12
sample_size_2 = 12

f_value = variance_1 / variance_2
print("F-value:", f_value)

df_numerator = sample_size_1 - 1
df_denominator = sample_size_2 - 1

p_value = 1 - f.cdf(f_value, df_numerator, df_denominator)
print("P-value:", p_value)

if p_value < alpha:
    print("The variances are significantly different.")
else:
    print("The variances are not significantly different.")

F-value: 0.6666666666666666
P-value: 0.7438051006321003
The variances are not significantly different.


### Q5. A manufacturer claims that the variance of the diameter of a certain product is 0.005. A sample of 25 products is taken, and the sample variance is found to be 0.006. Conduct an F-test at the 1% significance level to determine if the claim is justified.

In [7]:
# Import the necessary libraries:
from scipy.stats import f

# Set the significance level:
alpha = 0.01

# Define the claim variance and the sample variance:
claim_variance = 0.005
sample_variance = 0.006

# Calculate the F-value:
f_value = sample_variance / claim_variance

# Calculate the degrees of freedom:
df_numerator = 24
df_denominator = 24

# Calculate the p-value:
p_value = 1 - f.cdf(f_value, df_numerator, df_denominator)

# Compare the p-value to the significance level:
if p_value < alpha:
    print("The claim about the variance is not justified.")
else:
    print("The claim about the variance is justified.")

The claim about the variance is justified.


### Q6. Write a Python function that takes in the degrees of freedom for the numerator and denominator of an F-distribution and calculates the mean and variance of the distribution. The function should return the mean and variance as a tuple.

In [8]:
def calculat_f_distribution(df_numerator, df_denominator):
    # Calculate the mean of the F-distribution
    mean = df_numerator / (df_denominator - 2)
    
    # Calculate the variance of the F-distribution
    variance = (2 * (df_denominator**2) * (df_numerator + df_denominator - 2)) / ((df_numerator * (df_denominator - 2)**2 * (df_denominator - 4)))
    
    return mean, variance

df_numerator = 5
df_denominator = 10

mean, variance = calculat_f_distribution(df_numerator, df_denominator)

print("Mean:", mean)
print("Variance:", variance)

Mean: 0.625
Variance: 1.3541666666666667


### Q7. A random sample of 10 measurements is taken from a normal population with unknown variance. The sample variance is found to be 25. Another random sample of 15 measurements is taken from another normal population with unknown variance, and the sample variance is found to be 20. Conduct an F-test at the 10% significance level to determine if the variances are significantly different.

To conduct an F-test to compare the variances of two populations, we can follow these steps:

1. Set up the hypotheses:
   - Null hypothesis (H0): The variances of the two populations are equal.
   - Alternative hypothesis (H1): The variances of the two populations are significantly different.

2. Choose the significance level (α). In this case, α = 0.10.

3. Calculate the F-statistic:
   - F-statistic = Sample Variance 1 / Sample Variance 2

4. Determine the critical F-value based on the degrees of freedom and the significance level. The degrees of freedom for the numerator (df1) is the sample size minus 1, and the degrees of freedom for the denominator (df2) is also the sample size minus 1.

5. Compare the calculated F-statistic with the critical F-value:
   - If the calculated F-statistic is greater than the critical F-value, reject the null hypothesis. There is evidence to suggest that the variances are significantly different.
   - If the calculated F-statistic is less than or equal to the critical F-value, fail to reject the null hypothesis. There is not enough evidence to conclude that the variances are significantly different.

In [9]:
import scipy.stats as stats

# sample variances
sample_var1 = 25
sample_var2 = 20

# Sample sizes
n1 = 10
n2 = 15

# Degrees of freedom
df1 = n1 - 1
df2 = n2 - 1

# Calculate the F-statistic
f_statistic = sample_var1 / sample_var2
print("F-statistic:", f_statistic)

# Calculate the critical F-value
alpha = 0.10
critical_f_value = stats.f.ppf(1-alpha / 2, df1, df2)
print("Critical f-value:", critical_f_value)

# Perdorm the F-test
if f_statistic > critical_f_value:
    print("Reject the null hypothesis. The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis. The variances are not significantly different.")

F-statistic: 1.25
Critical f-value: 2.6457907352338195
Fail to reject the null hypothesis. The variances are not significantly different.


In this case, the calculated F-statistic is 1.25, and the critical F-value at the 10% significance level with (df1=9, df2=14) degrees of freedom is approximately 2.65. Since the calculated F-statistic is less than the critical F-value, we fail to reject the null hypothesis. Therefore, there is not enough evidence to conclude that the variances of the two populations are significantly different at the 10% significance level.

### Q8. The following data represent the waiting times in minutes at two different restaurants on a Saturday night: 
### Restaurant A: 24, 25, 28, 23, 22, 20, 27; 
### Restaurant B: 31, 33, 35, 30, 32, 36. 
### Conduct an F-test at the 5% significance level to determine if the variances are significantly different.

Ans) To conduct an F-test to compare the variances of two restaurants, we can follow these steps:

1. Set up the hypotheses:
   - Null hypothesis (H0): The variances of the two populations are equal.
   - Alternative hypothesis (H1): The variances of the two populations are significantly different.

2. Choose the significance level (α). In this case, α = 0.05.

3. Calculate the sample variances of each group:
   - Sample Variance of Restaurant A (s1^2)
   - Sample Variance of Restaurant B (s2^2)

4. Determine the degrees of freedom for each group:
   - Degrees of freedom for Restaurant A (df1) = n1 - 1, where n1 is the sample size of Restaurant A.
   - Degrees of freedom for Restaurant B (df2) = n2 - 1, where n2 is the sample size of Restaurant B.

5. Calculate the F-statistic:
   - F-statistic = s1^2 / s2^2

6. Determine the critical F-value based on the degrees of freedom and the significance level.

7. Compare the calculated F-statistic with the critical F-value:
   - If the calculated F-statistic is greater than the critical F-value, reject the null hypothesis. There is evidence to suggest that the variances are significantly different.
   - If the calculated F-statistic is less than or equal to the critical F-value, fail to reject the null hypothesis. There is not enough evidence to conclude that the variances are significantly different.

In [10]:
import numpy as np
from scipy import stats

# Data for each group
restaurant_a = [24, 25, 28, 23, 22, 20, 27]
restaurant_b = [31, 33, 35, 30, 32, 36]

# sample variance
sample_var_a = np.var(restaurant_a, ddof=1)
sample_var_b = np.var(restaurant_b, ddof=1)

# sample sizes
n1 = len(restaurant_a)
n2 = len(restaurant_b)

# Degrees of freedom
df1 = n1 - 1
df2 = n2 - 1

# Calculate the F-statistic
f_statistic = sample_var_a / sample_var_b
print("F-statistic:", f_statistic)

# calculate the critical F-value
alpha = 0.05
critical_f_value = stats.f.ppf(1 - alpha / 2, df1, df2)
print("Criticl F-value:", critical_f_value)

# Perform the F-test
if f_statistic > critical_f_value:
    print("Reject the null hypothesis. The variances are significantlt different.")
else:
    print("Fail to reject the null hypothesis. The variances are not significantly different.")

F-statistic: 1.4551907719609583
Criticl F-value: 6.977701858535566
Fail to reject the null hypothesis. The variances are not significantly different.


In this case, the calculated F-statistic is approximately 1.455, and the critical F-value at the 5% significance level with (df1=6, df2=5) degrees of freedom is approximately 6.977. Since the calculated F-statistic is less than the critical F-value, we fail to reject the null hypothesis. Therefore, there is not enough evidence to conclude that the variances of the two populations are significantly different at the 5% significance level.

### Q9. The following data represent the test scores of two groups of students: 
### Group A: 80, 85, 90, 92, 87, 83;
### Group B: 75, 78, 82, 79, 81, 84. 
### Conduct an F-test at the 1% significance level to determine if the variances are significantly different.

Ans) To conduct an F-test to compare the variances of two populations, we can follow these steps:

1. Set up the hypotheses:
   - Null hypothesis (H0): The variances of the two groups are equal.
   - Alternative hypothesis (H1): The variances of the two groups are significantly different.

2. Choose the significance level (α). In this case, α = 0.05.

3. Calculate the sample variances of each group:
   - Sample Variance of Group A (s1^2)
   - Sample Variance of Group B (s2^2)

4. Determine the degrees of freedom for each group:
   - Degrees of freedom for Group A (df1) = n1 - 1, where n1 is the sample size of Group A.
   - Degrees of freedom for Group B (df2) = n2 - 1, where n2 is the sample size of Group B.

5. Calculate the F-statistic:
   - F-statistic = s1^2 / s2^2

6. Determine the critical F-value based on the degrees of freedom and the significance level.

7. Compare the calculated F-statistic with the critical F-value:
   - If the calculated F-statistic is greater than the critical F-value, reject the null hypothesis. There is evidence to suggest that the variances are significantly different.
   - If the calculated F-statistic is less than or equal to the critical F-value, fail to reject the null hypothesis. There is not enough evidence to conclude that the variances are significantly different.

In [11]:
import numpy as np 
import scipy.stats as stats

# data for each group
group_a = [80, 85, 90, 92, 87, 83]
group_b = [75, 78, 82, 79, 81, 84]

#sample variance 
sample_var_a = np.var(group_a, ddof=1)
sample_var_b = np.var(group_b, ddof=1)

# sample size
n1 = len(group_a)
n2 = len(group_b)

# Degrees of freedom
df1 = n1 - 1
df2 = n2 - 1

# calculate the F-statistic
f_statistic = sample_var_a / sample_var_b
print("F-statistic:", f_statistic)

# Calculate the critical F-value
alpha = 0.01
critical_f_value = stats.f.ppf(1 - alpha/2, df1, df2)
print("Critical F-value:", critical_f_value)

# perform the F-test
if f_statistic > critical_f_value:
    print("Reject the nul hypothesis. The variances are significantly different.")
else:
    print("Fail to reject the nul hypothesis. The variances are not significantly different.")

F-statistic: 1.9442622950819677
Critical F-value: 14.939605459912224
Fail to reject the nul hypothesis. The variances are not significantly different.


In this case, the calculated F-statistic is approximately 1.944, and the critical F-value at the 1% significance level with (df1=14, df2=5) degrees of freedom is approximately 14.939. Since the calculated F-statistic is less than the critical F-value, we fail to reject the null hypothesis. Therefore, there is not enough evidence to conclude that the variances of the two groyps are significantly different at the 5% significance level.