ASSIGNMENT:STATISTICS-10

1.  Write a Python function that takes in two arrays of data and calculates the F-value for a variance ratio 
test. The function should return the F-value and the corresponding p-value for the test.

In [1]:
import numpy as np
from scipy.stats import f

def variance_ratio_test(data1, data2):
    n1 = len(data1)
    n2 = len(data2)
    df1 = n1 - 1
    df2 = n2 - 1
    var1 = np.var(data1, ddof=1)
    var2 = np.var(data2, ddof=1)
    f_value = var1 / var2
    p_value = f.cdf(f_value, df1, df2)
    return f_value, p_value


In [2]:
data1 = [1, 2, 3, 4, 5]
data2 = [6, 7, 8, 9, 10]
f_value, p_value = variance_ratio_test(data1, data2)
print("F-value:", f_value)
print("p-value:", p_value)


F-value: 1.0
p-value: 0.5


2.  Given a significance level of 0.05 and the degrees of freedom for the numerator and denominator of an 
F-distribution, write a Python function that returns the critical F-value for a two-tailed test.

In [3]:
from scipy.stats import f

def critical_f(num_df, denom_df, alpha=0.05):
    """
    Calculates the critical F-value for a two-tailed test with a given
    significance level (default is 0.05) and degrees of freedom for the 
    numerator and denominator of an F-distribution.
    
    Args:
    - num_df (int): degrees of freedom for numerator
    - denom_df (int): degrees of freedom for denominator
    - alpha (float): significance level (default is 0.05)
    
    Returns:
    - crit_f (float): critical F-value for the given significance level
    """
    crit_f = f.isf(alpha/2, num_df, denom_df)
    return crit_f


In [4]:
critical_f(2, 25, 0.05)


4.290932366996311

3. Write a Python program that generates random samples from two normal distributions with known 
variances and uses an F-test to determine if the variances are equal. The program should output the F-value, degrees of freedom, and p-value.


In [1]:
import numpy as np
from scipy.stats import f

# Set the parameters
mean1 = 0
mean2 = 2
var1 = 1
var2 = 1.5
n1 = 30
n2 = 40

# Generate the samples
x1 = np.random.normal(mean1, np.sqrt(var1), n1)
x2 = np.random.normal(mean2, np.sqrt(var2), n2)

# Compute the F-test
F = np.var(x1, ddof=1) / np.var(x2, ddof=1)
df1 = n1 - 1
df2 = n2 - 1
p_value = 1 - f.cdf(F, df1, df2)

# Output the results
print("F-value:", F)
print("Degrees of freedom:", df1, ",", df2)
print("P-value:", p_value)


F-value: 0.9716509749039276
Degrees of freedom: 29 , 39
P-value: 0.5259884019372651


4. The variances of two populations are known to be 10 and 15. A sample of 12 observations is taken from 
each population. Conduct an F-test at the 5% significance level to determine if the variances are 
significantly different

To conduct an F-test to determine if the variances of two populations are significantly different, we need to set up the null and alternative hypotheses:

Null hypothesis: The variances of the two populations are equal.
Alternative hypothesis: The variances of the two populations are significantly different.
We can use an F-test to test these hypotheses. The F-test statistic is given by:

F = s1^2 / s2^2

where s1^2 and s2^2 are the sample variances of the two populations. Under the null hypothesis of equal variances, the F-test statistic follows an F-distribution with degrees of freedom (df1, df2) = (n1 - 1, n2 - 1).

At the 5% significance level, we reject the null hypothesis if the F-test statistic is greater than the critical value of the F-distribution with degrees of freedom (df1, df2) = (n1 - 1, n2 - 1) and significance level of 0.05.

Now let's apply this to the problem at hand. We know that the variances of the two populations are 10 and 15, and we have a sample of 12 observations from each population.

In [8]:
import scipy.stats as stats

var1 = 10
var2 = 15
n1 = 12
n2 = 12
alpha = 0.05

# Calculate the F-test statistic
F = var1 / var2
print("F value: ",F)

# Calculate the critical value
df1 = n1 - 1
df2 = n2 - 1
critical_value = stats.f.ppf(1 - alpha/2, df1, df2)
print("Critical value:", critical_value)
# Compare F-test statistic with critical value and print result
if F > critical_value:
    print("The variances are significantly different.")
else:
    print("The variances are not significantly different.")


F value:  0.6666666666666666
Critical value: 3.473699051085809
The variances are not significantly different.


5. A manufacturer claims that the variance of the diameter of a certain product is 0.005. A sample of 25 
products is taken, and the sample variance is found to be 0.006. Conduct an F-test at the 1% significance 
level to determine if the claim is justified

In [11]:
import scipy.stats as stats

var_claimed = 0.005
var_sample = 0.006
n = 25
alpha = 0.01

# Calculate the F-test statistic
F = var_sample / var_claimed
print("F value: ",F)

# Calculate the critical value
df1 = n - 1
df2 = float("inf")  # Since the claimed variance has a single degree of freedom
critical_value = stats.f.ppf(1 - alpha, df1, df2)

# Compare F-test statistic with critical value and print result
if F > critical_value:
    print("The manufacturer's claim is not justified at the 1% significance level.")
else:
    print("The manufacturer's claim is justified at the 1% significance level.")

# Print critical value
print("Critical value:", critical_value)


F value:  1.2
The manufacturer's claim is justified at the 1% significance level.
Critical value: nan


6.  Write a Python function that takes in the degrees of freedom for the numerator and denominator of an 
F-distribution and calculates the mean and variance of the distribution. The function should return the 
mean and variance as a tuple

In [12]:
import math

def f_distribution_mean_and_variance(df1, df2):
    if df1 <= 0 or df2 <= 0:
        raise ValueError("Degrees of freedom must be positive")
    
    mean = df2 / (df2 - 2)
    variance = (2 * df2 ** 2 * (df1 + df2 - 2)) / (df1 * (df2 - 2) ** 2 * (df2 - 4))
    
    return (mean, variance)


The function takes in two arguments: df1 and df2, which represent the degrees of freedom for the numerator and denominator of an F-distribution, respectively. The function first checks that both degrees of freedom are positive, and raises a ValueError if either one is not.

The mean and variance of the F-distribution are then calculated using the following formulas:

Mean: df2 / (df2 - 2)
Variance: (2 * df2 ** 2 * (df1 + df2 - 2)) / (df1 * (df2 - 2) ** 2 * (df2 - 4))
The function returns the mean and variance as a tuple. You can call this function with the appropriate values of df1 and df2 to get the mean and variance of the F-distribution.

In [14]:
f_distribution_mean_and_variance(5,8)

(1.3333333333333333, 1.9555555555555555)

7. A random sample of 10 measurements is taken from a normal population with unknown variance. The 
sample variance is found to be 25. Another random sample of 15 measurements is taken from another 
normal population with unknown variance, and the sample variance is found to be 20. Conduct an F-test 
at the 10% significance level to determine if the variances are significantly different

In [15]:
import scipy.stats as stats

n1 = 10
n2 = 15
var1 = 25
var2 = 20
alpha = 0.1

# Calculate the F-test statistic
F = var1 / var2

print("Fvalue: ",F)

# Calculate the degrees of freedom
df1 = n1 - 1
df2 = n2 - 1

# Calculate the critical value
critical_value = stats.f.ppf(1 - alpha/2, df1, df2)

# Compare F-test statistic with critical value and print result
if F < 1/critical_value or F > critical_value:
    print("The variances are significantly different at the 10% significance level.")
else:
    print("There is not enough evidence to suggest that the variances are significantly different at the 10% significance level.")

# Print critical value
print("Critical value:", critical_value)


Fvalue:  1.25
There is not enough evidence to suggest that the variances are significantly different at the 10% significance level.
Critical value: 2.6457907352338195


8. The following data represent the waiting times in minutes at two different restaurants on a Saturday 
night: Restaurant A: 24, 25, 28, 23, 22, 20, 27; Restaurant B: 31, 33, 35, 30, 32, 36. Conduct an F-test at the 5% 
significance level to determine if the variances are significantly different

In [17]:
import numpy as np
import scipy.stats as stats

data1 = np.array([24, 25, 28, 23, 22, 20, 27])
data2 = np.array([31, 33, 35, 30, 32, 36])

df1 = len(data1) - 1
df2 = len(data2) - 1

alpha = 0.05

var1 = np.var(data1)
var2 = np.var(data2)

# Calculate the F-test statistic
if var1 > var2:
    F = var1 / var2
else:
    F = var2 / var1

print("F-value:", F)

# Calculate the critical value
critical_value = stats.f.ppf(1 - alpha / 2, df1, df2)

# Compare F-test statistic with critical value and print result
if F < 1 / critical_value or F > critical_value:
    print("The variances are significantly different at the", alpha * 100, "% significance level.")
else:
    print("There is not enough evidence to suggest that the variances are significantly different at the", alpha * 100, "% significance level.")

# Print critical value
print("Critical value:", critical_value)




F-value: 1.496767651159843
There is not enough evidence to suggest that the variances are significantly different at the 5.0 % significance level.
Critical value: 6.977701858535566


9.  The following data represent the test scores of two groups of students: Group A: 80, 85, 90, 92, 87, 83; 
Group B: 75, 78, 82, 79, 81, 84. Conduct an F-test at the 1% significance level to determine if the variances 
are significantly different.

In [19]:
import numpy as np
import scipy.stats as stats

data1 = np.array([80, 85, 90, 92, 87, 83])
data2 = np.array([75, 78, 82, 79, 81, 84])

df1 = len(data1) - 1
df2 = len(data2) - 1

alpha = 0.01

var1 = np.var(data1)
var2 = np.var(data2)

# Calculate the F-test statistic
if var1 > var2:
    F = var1 / var2
else:
    F = var2 / var1

print("F-value:", F)

# Calculate the critical value
critical_value = stats.f.ppf(1 - alpha / 2, df1, df2)

# Compare F-test statistic with critical value and print result
if F < 1 / critical_value or F > critical_value:
    print("The variances are significantly different at the", alpha * 100, "% significance level.")
else:
    print("There is not enough evidence to suggest that the variances are significantly different at the", alpha * 100, "% significance level.")

# Print critical value
print("Critical value:", critical_value)


F-value: 1.9442622950819677
There is not enough evidence to suggest that the variances are significantly different at the 1.0 % significance level.
Critical value: 14.939605459912224
