## Q1. Write a Python function that takes in two arrays of data and calculates the F-value for a variance ratio test. The function should return the F-value and the corresponding p-value for the test.


In [1]:
import numpy as np
import scipy.stats as stat

In [2]:
group_a = [80, 85, 90, 92, 87, 83]
group_b = [75, 78, 82, 79, 81, 84]
alpha = 0.99

In [3]:
variance_a = np.var(group_a)
variance_b = np.var(group_b)

In [4]:
f_value = variance_a/variance_b

In [5]:
df_a = len(group_a) - 1
df_b = len(group_b) - 1

In [6]:
p_value = stat.f.cdf(f_value, df_a, df_b)

In [7]:
print('Degree of freedom 1:',df_a)
print('Degree of freedom 2:',df_b)
print("F-statistic:", f_value)
print("p-value:", p_value)

Degree of freedom 1: 5
Degree of freedom 2: 5
F-statistic: 1.9442622950819677
p-value: 0.7584478225464656


## Q2. Given a significance level of 0.05 and the degrees of freedom for the numerator and denominator of an F-distribution, write a Python function that returns the critical F-value for a two-tailed test

In [9]:
# Importing libraries 
import statsmodels.api as sm 
from statsmodels.formula.api import ols 
import numpy as np
import pandas as pd

#create data
df = pd.DataFrame({'water': np.repeat(['daily', 'weekly'], 15),
                   'sun': np.tile(np.repeat(['low', 'med', 'high'], 5), 2),
                   'height': [6, 6, 6, 5, 6, 5, 5, 6, 4, 5,
                              6, 6, 7, 8, 7, 3, 4, 4, 4, 5,
                              4, 4, 4, 4, 4, 5, 6, 6, 7, 8]})


#perform two-way ANOVA
model = ols('height ~ C(water) + C(sun) + C(water):C(sun)', data=df).fit()
sm.stats.anova_lm(model, typ=2)



Unnamed: 0,sum_sq,df,F,PR(>F)
C(water),8.533333,1.0,16.0,0.000527
C(sun),24.866667,2.0,23.3125,2e-06
C(water):C(sun),2.466667,2.0,2.3125,0.120667
Residual,12.8,24.0,,


We can see the following p-values for each of the factors in the table:


water: p-value = .000527,
  
Sun: p-value = .0000002

water*sun: p-value = .1206670667

Since the p-values for water and sun are both less than .05, this means that both factors have a statistically significant effect on plant height.

And since the p-value for the interaction effect (.120667) is not less than .05, this tells us that there is no significant interaction effect between sunlight exposure and watering frequency.

Note: Although the ANOVA results tell us that watering frequency and sunlight exposure have a statistically significant effect on plant height, we would need to perform post-hoc tests to determine exactly how different levels of water and sunlight affect plant height.

## Q3. Write a Python program that generates random samples from two normal distributions with known variances and uses an F-test to determine if the variances are equal. The program should output the Fvalue, degrees of freedom, and p-value for the test.


In [18]:
import numpy as np

var_1 = float(input("Enter the Variance_1:"))
var_2 = float(input("Enter the Variance_2:"))

mean_1 = 10
mean_2 = 10

size_1 = 10
size_2 = 10

#generating the data

normal_data_1 = np.random.normal(loc=mean_1, scale=np.sqrt(var_1), size=size_1)
normal_data_2 = np.random.normal(loc=mean_2, scale=np.sqrt(var_2), size=size_2)

print(normal_data_1)
print(normal_data_2)

Enter the Variance_1: 2
Enter the Variance_2: 7


[10.10179123 10.19868154  9.74093234  8.47681257 10.50526331 10.37829332
  9.22167887  8.91074821 11.14931839 10.5591079 ]
[ 8.48108999  7.59227763 10.63535518  6.66884585  7.72118972 13.32390537
  8.94261369  9.23793925  9.15201103  7.14034289]


In [21]:
import scipy.stats as stats

# Calculate the F-statistic
f_value = var_1 / var_2

# Calculate the degrees of freedom
df1 = size_1 - 1
df2 = size_2 - 1

# Calculate the p-value
p_value = stats.f.pdf(f_value, df1, df2)

# Print the results
print('Degree of freedom 1:',df1)
print('Degree of freedom 2:',df2)
print("F-statistic:", f_value)
print("p-value:", p_value)

# assume significance Level is 5%

alpha = 0.95

if p_value > alpha:
    print("Reject the null hypothesis that Var(X) == Var(Y)")
else:
    print("Accept the Null Hypothesis that Var(X) == Var(Y)")

Degree of freedom 1: 9
Degree of freedom 2: 9
F-statistic: 0.2857142857142857
p-value: 0.3869845600725601
Accept the Null Hypothesis that Var(X) == Var(Y)


## Q4.The variances of two populations are known to be 10 and 15. A sample of 12 observations is taken from each population. Conduct an F-test at the 5% significance level to determine if the variances are significantly different.


In [22]:
# provided

var_11 = 10
var_12 = 15

sample_size_both = 12

alpha = 0.95

In [23]:
import numpy as np
import scipy.stats as stat

f_value = var_11/var_12

df_both = sample_size_both - 1

p_value = stat.f.cdf(f_value, df_both, df_both)

print("F-statistic:", f_value)
print("p-value:", p_value)

if p_value > alpha:
    print("Reject the null hypothesis that Var(X) == Var(Y)")
else:
    print("Accept the Null Hypothesis that Var(X) == Var(Y)")

F-statistic: 0.6666666666666666
p-value: 0.2561948993678998
Accept the Null Hypothesis that Var(X) == Var(Y)


## Q5. A manufacturer claims that the variance of the diameter of a certain product is 0.005. A sample of 25 products is taken, and the sample variance is found to be 0.006. Conduct an F-test at the 1% significance level to determine if the claim is justified.


In [24]:
# provided

var_p1 = .005
var_s1 = .006

alpha = 0.99

sample_size = 25


In [28]:
f_value = var_p1/var_s1

degree_freedom = sample_size - 1

p_value = stat.f.cdf(f_value,degree_freedom, degree_freedom)

print("F-statistic:", f_value)
print("p-value:", p_value)

if p_value > alpha:
    print("Reject the null hypothesis that Sample Variance == Product Variance")
else:
    print("Accept the Null Hypothesis that Sample Variance == Product Variance")

F-statistic: 0.8333333333333334
p-value: 0.3293654682817239
Accept the Null Hypothesis that Sample Variance == Product Variance


## Q6. Write a Python function that takes in the degrees of freedom for the numerator and denominator of an F-distribution and calculates the mean and variance of the distribution. The function should return the mean and variance as a tuple.


In [29]:
def calculate_distribution(df1, df2):
    mean = df2 / (df2 - 1)
    variance = ((2 * df2 * (df1 + df2 -2))/ (df1 * (df2 - 2)**2 * (df2 - 4)))
    return (mean, variance)

In [30]:
mean_var = calculate_distribution(6,11)
print("(Mean, Variance): ", mean_var)

(Mean, Variance):  (1.1, 0.09700176366843033)


Source: 
1. https://www.sciencedirect.com/topics/mathematics/f-distribution

## Q7. A random sample of 10 measurements is taken from a normal population with unknown variance. The sample variance is found to be 25. Another random sample of 15 measurements is taken from another normal population with unknown variance, and the sample variance is found to be 20. Conduct an F-test at the 10% significance level to determine if the variances are significantly different.


In [31]:
sample_size_1 = 10
sample_size_2 = 15

sample_variance_1 = 25
sample_variance_2 = 20

alpha = 0.90

In [32]:
f_value = sample_variance_1/sample_variance_2

In [33]:
df1 = sample_size_1 - 1
df2 = sample_size_2 - 2

In [36]:
p_value = stat.f.cdf(f_value, df1, df2)

In [38]:
print("F-statistic:", f_value)
print("p-value:", p_value)

if p_value > alpha:
    print("Reject the null hypothesis that First Sample Variance == Second Sample Variance")
else:
    print("Accept the Null Hypothesis that First Sample Variance == Second Sample Variance")

F-statistic: 1.25
p-value: 0.6536719696789688
Accept the Null Hypothesis that First Sample Variance == Second Sample Variance


## Q8. The following data represent the waiting times in minutes at two different restaurants on a Saturday night: Restaurant A: 24, 25, 28, 23, 22, 20, 27; Restaurant B: 31, 33, 35, 30, 32, 36. Conduct an F-test at the 5% significance level to determine if the variances are significantly different.


In [2]:
res_a_data = [24, 25, 28, 23, 22, 20, 27]
res_b_data = [31, 33, 35, 30, 32, 36]

alpha = 0.95

In [1]:
import numpy as np
import scipy.stats as stat

In [3]:
var_a = np.var(res_a_data)
var_b = np.var(res_b_data)

In [4]:
f_value = var_a/var_b

In [5]:
df_a = len(res_a_data) - 1
df_b = len(res_b_data) - 1

In [8]:
p_value = stat.f.cdf(f_value, df_a, df_b)

In [9]:
print("F-statistic:", f_value)
print("p-value:", p_value)

if p_value > alpha:
    print("Reject the null hypothesis that First Sample Variance == Second Sample Variance")
else:
    print("Accept the Null Hypothesis that First Sample Variance == Second Sample Variance")

F-statistic: 1.496767651159843
p-value: 0.6625866430359175
Accept the Null Hypothesis that First Sample Variance == Second Sample Variance


## Q9. The following data represent the test scores of two groups of students: Group A: 80, 85, 90, 92, 87, 83; Group B: 75, 78, 82, 79, 81, 84. Conduct an F test at the 1% significance level to determine if the variances are significantly different.t.

In [1]:
student_a = [80, 85, 90, 92, 87, 83]
student_b = [75, 78, 82, 79, 81, 84]

alpha = 0.99

In [2]:
import numpy as np
import scipy.stats as stat

In [4]:
df_a = len(student_a) - 1
df_b = len(student_b) - 1

In [5]:
var_a = np.var(student_a)
var_b = np.var(student_b)

In [6]:
f_value = var_a/var_b

In [7]:
p_value = stat.f.cdf(f_value, df_a, df_b)

In [9]:
print("F-statistic:", f_value)
print("p-value:", p_value)

if p_value > alpha:
    print("Reject the null hypothesis that Sample Variance of Group A == Sample Variance of Group B")
else:
    print("Accept the Null Hypothesis that Sample Variance of Group A == Sample Variance of Group B")

F-statistic: 1.9442622950819677
p-value: 0.7584478225464656
Accept the Null Hypothesis that Sample Variance of Group A == Sample Variance of Group B
