Q1. Write a Python function that takes in two arrays of data and calculates the F-value for a variance ratio test. The function should return the F-value and the corresponding p-value for the test.

In [12]:
import numpy as np
from scipy.stats import f

def variance_ratio_test(data1, data2):
    n1 = len(data1)
    n2 = len(data2)
    
    var1 = np.var(data1, ddof = 1)
    var2 = np.var(data2, ddof = 2)
    
    F = var1 / var2
    p_value = 1 - f.cdf(F, n1-1, n2-1)
    
    return F, p_value

data1 = [15,18,21,24,27]
data2 = [12,16,20,24,28]

F_val, p_val = variance_ratio_test(data1 , data2)
print("F-value : ", F_val)
print("p-value : ", p_val)

F-value :  0.421875
p-value :  0.7881407325918858


Q2. Given a significance level of 0.05 and the degrees of freedom for the numerator and denominator of an F-distribution, write a Python function that returns the critical F-value for a two-tailed test.

In [6]:
from scipy.stats import f

def critical_fval(alpha, df_num, df_den):
    critical_val = f.ppf(1 - alpha/2, df_num, df_den)
    return critical_val

alpha = 0.05
df_num = 3
df_den = 15

cv = critical_fval(alpha, df_num, df_den)
print("Critical F-value : ", cv)

Critical F-value :  4.152804030062877


Q3. Write a Python program that generates random samples from two normal distributions with known variances and uses an F-test to determine if the variances are equal. The program should output the Fvalue, degrees of freedom, and p-value for the test.

In [13]:
import numpy as np
import scipy.stats as stats

# Set the random seed for reproducibility
np.random.seed(42)

# Parameters for the normal distributions
mean_a = 100
variance_a = 25
mean_b = 110
variance_b = 36

# Sample sizes
n_a = 30
n_b = 30

# Generate random samples from the normal distributions
sample_a = np.random.normal(mean_a, np.sqrt(variance_a), n_a)
sample_b = np.random.normal(mean_b, np.sqrt(variance_b), n_b)

# Perform the F-test
variance_a_sample = np.var(sample_a, ddof=1)
variance_b_sample = np.var(sample_b, ddof=1)

df1 = n_a - 1
df2 = n_b - 1

F_statistic = variance_a_sample / variance_b_sample
p_value = 1 - stats.f.cdf(F_statistic, df1, df2)

# Print the results
print("Sample Variance A:", variance_a_sample)
print("Sample Variance B:", variance_b_sample)
print("Degrees of Freedom:", df1, df2)
print("F-statistic:", F_statistic)
print("p-value:", p_value)


Sample Variance A: 20.250289234141302
Sample Variance B: 31.210246900790942
Degrees of Freedom: 29 29
F-statistic: 0.6488346374993949
p-value: 0.87499776662271


Q4.The variances of two populations are known to be 10 and 15. A sample of 12 observations is taken from each population. Conduct an F-test at the 5% significance level to determine if the variances are significantly different.

In [14]:
import scipy.stats as stats

# Known population variances
variance_population1 = 10
variance_population2 = 15

# Sample sizes
n1 = 12
n2 = 12

# Degrees of freedom
df1 = n1 - 1
df2 = n2 - 1

# Calculate the F-statistic
F_statistic = variance_population1 / variance_population2

# Calculate the critical value from the F-distribution
alpha = 0.05
critical_value = stats.f.ppf(1 - alpha, df1, df2)

# Compare the F-statistic with the critical value
if F_statistic > critical_value:
    result = "Reject the null hypothesis. Variances are significantly different."
else:
    result = "Fail to reject the null hypothesis. Variances are not significantly different."

# Print the results
print("F-statistic:", F_statistic)
print("Critical value:", critical_value)
print(result)


F-statistic: 0.6666666666666666
Critical value: 2.8179304699530863
Fail to reject the null hypothesis. Variances are not significantly different.


Q5. A manufacturer claims that the variance of the diameter of a certain product is 0.005. A sample of 25 products is taken, and the sample variance is found to be 0.006. Conduct an F-test at the 1% significance level to determine if the claim is justified.

In [15]:
import scipy.stats as stats

# Claimed population variance
variance_claimed = 0.005

# Sample variance and sample size
variance_sample = 0.006
n = 25

# Degrees of freedom
df1 = n - 1
df2 = n - 1

# Calculate the F-statistic
F_statistic = variance_sample / variance_claimed

# Calculate the critical value from the F-distribution
alpha = 0.01
critical_value = stats.f.ppf(1 - alpha, df1, df2)

# Compare the F-statistic with the critical value
if F_statistic < critical_value:
    result = "Fail to reject the null hypothesis. The claim is justified."
else:
    result = "Reject the null hypothesis. The claim is not justified."

# Print the results
print("F-statistic:", F_statistic)
print("Critical value:", critical_value)
print(result)


F-statistic: 1.2
Critical value: 2.659072104348157
Fail to reject the null hypothesis. The claim is justified.


Q6. Write a Python function that takes in the degrees of freedom for the numerator and denominator of an F-distribution and calculates the mean and variance of the distribution. The function should return the mean and variance as a tuple.

In [3]:
def f_distribution_mean_var(df_num, df_den):
    if df_num <=0 or df_den <=0:
        raise ValueError("Degree of freedom must be greater than 0.")
    if df_den == 1:
        raise ValueError("For denominator degree of freedom equal to 1, the f-distribution is undefined.")
    
    mean = df_den / (df_den - 2)
    if df_den > 4:
        variance = (2 * (df_den **2) * (df_num + df_den -2)) / (df_num * (df_den - 2) **2 * (df_den -4))
    else :
        variance = float('inf') 
        
    return mean , variance


df_num = 5
df_den = 10
m, v = f_distribution_mean_var(df_num, df_den)
print(f"Mean : {m}, Variance : {v}")

Mean : 1.25, Variance : 1.3541666666666667


Q7. A random sample of 10 measurements is taken from a normal population with unknown variance. The sample variance is found to be 25. Another random sample of 15 measurements is taken from another normal population with unknown variance, and the sample variance is found to be 20. Conduct an F-test at the 10% significance level to determine if the variances are significantly different.

In [5]:
import scipy.stats as stats

var1 = 25
var2 = 20

n1 = 10
n2 = 15

df1 = n1-1
df2 = n2-1

f = var1 / var2

alpha = 0.10
cv = stats.f.ppf(1-alpha, df1, df2)

if f > cv:
    result = "Reject the null hypothesis : variances are significantly different."
else :
    result = "We fail to reject the null hypothesis : variances are not significantly different."
    

print("F-statistic : ", f)
print("Critical value : ",cv)
print(result)

F-statistic :  1.25
Critical value :  2.121954566976902
We fail to reject the null hypothesis : variances are not significantly different.


Q8. The following data represent the waiting times in minutes at two different restaurants on a Saturday night: Restaurant A: 24, 25, 28, 23, 22, 20, 27; Restaurant B: 31, 33, 35, 30, 32, 36. Conduct an F-test at the 5% significance level to determine if the variances are significantly different.

In [8]:
import scipy.stats as stats
import numpy as np

data_a = [24, 25, 28, 23, 22, 20, 27]
data_b = [31, 33, 35, 30, 32, 36]

var_a = np.var(data_a)
var_b = np.var(data_b)

n_a = len(data_a)
n_b = len(data_b)

df1 = n_a - 1
df2 = n_b - 1

f = var_a / var_b

alpha = 0.05
cv = stats.f.ppf(1-alpha, df1, df2)

if f > cv:
    r = "Reject the null hypothesis : variances are significantly different."
else :
    r = "Fail to reject the null hypothesis : variances are not significantly different."
    
print("F-statistic : ",f)
print("Critical Value : ",cv)
print(r)

F-statistic :  1.496767651159843
Critical Value :  4.950288068694318
Fail to reject the null hypothesis : variances are not significantly different.


Q9. The following data represent the test scores of two groups of students: Group A: 80, 85, 90, 92, 87, 83; Group B: 75, 78, 82, 79, 81, 84. Conduct an F-test at the 1% significance level to determine if the variances are significantly different.

In [10]:
import scipy.stats as stats
import numpy as np

data_a = [80, 85, 90, 92, 87, 83]
data_b = [75, 78, 82, 79, 81, 84]

var_a = np.var(data_a)
var_b = np.var(data_b)

n_a = len(data_a)
n_b = len(data_b)

df_a = n_a - 1
df_b = n_b - 1

f = var_a / var_b

alpha = 0.01
cv = stats.f.ppf(1-alpha, df1, df2)

if f>cv:
    r = "Reject the null hypothesis : Variances are significantly different."
else :
    r = "We fail to reject the null hypothesis : Variances are not significantly different." 

print("F-statistic : ",f)
print("Critical value : ",cv)
print(r)

F-statistic :  1.9442622950819677
Critical value :  10.672254792434334
We fail to reject the null hypothesis : Variances are not significantly different.
