<a href="https://colab.research.google.com/github/drsubirghosh2008/drsubirghosh2008/blob/main/PW_Assignment_Module_20_251_10_24_Statistics_Advance_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Q1. Write a Python function that takes in two arrays of data and calculates the F-value for a variance ratio  test. The function should return the F-value and the corresponding p-value for the test.

Answer:

To calculate the F-value and p-value for a variance ratio test (often used in ANOVA or to compare variances of two samples), we can use the following Python function. The F-value is calculated by dividing the variance of one dataset by the variance of the other. The p-value is computed using the cumulative distribution function (CDF) of the F-distribution.

In [9]:
import numpy as np
from scipy.stats import f

def variance_ratio_test(data1, data2):
    # Convert lists to NumPy arrays to use the var() method
    data1 = np.array(data1)
    data2 = np.array(data2)

    # Calculate variances of the two data sets
    var1 = data1.var(ddof=1)  # Sample variance of data1
    var2 = data2.var(ddof=1)  # Sample variance of data2

    # Calculate the F-value
    F_value = var1 / var2 if var1 >= var2 else var2 / var1

    # Determine degrees of freedom
    dfn = len(data1) - 1  # Degrees of freedom for data1
    dfd = len(data2) - 1  # Degrees of freedom for data2

    # Calculate the p-value
    p_value = 2 * min(f.cdf(F_value, dfn, dfd), 1 - f.cdf(F_value, dfn, dfd))

    return F_value, p_value

In [10]:
data1 = [10, 12, 23, 23, 16, 23, 21, 16]
data2 = [14, 15, 20, 22, 24, 18, 20, 19]

F_value, p_value = variance_ratio_test(data1, data2)
print("F-value:", F_value)
print("p-value:", p_value)



F-value: 2.4615384615384617
p-value: 0.2576091573279271


Q2. Given a significance level of 0.05 and the degrees of freedom for the numerator and denominator of an F-distribution, write a Python function that returns the critical F-value for a two-tailed test.

Answer:

To calculate the critical F-value for a two-tailed test given a significance level and degrees of freedom, we can use the inverse cumulative distribution function (percent point function, ppf) of the F-distribution. For a two-tailed test with significance level
𝛼
=
0.05
α=0.05, we will split it into two tails, each with
𝛼
/
2
=
0.025
α/2=0.025.

In [12]:
# Here's a Python function to compute the critical F-value for a two-tailed test:

from scipy.stats import f

def critical_f_value(alpha, dfn, dfd):
    # Divide alpha by 2 for a two-tailed test
    alpha_tail = alpha / 2

    # Calculate the critical F-values for both tails
    f_critical_low = f.ppf(alpha_tail, dfn, dfd)      # Left tail
    f_critical_high = f.ppf(1 - alpha_tail, dfn, dfd)  # Right tail

    return f_critical_low, f_critical_high


In [13]:
alpha = 0.05
dfn = 5  # Degrees of freedom for the numerator
dfd = 10  # Degrees of freedom for the denominator

f_critical_low, f_critical_high = critical_f_value(alpha, dfn, dfd)
print("Lower critical F-value:", f_critical_low)
print("Upper critical F-value:", f_critical_high)


Lower critical F-value: 0.15107670102998205
Upper critical F-value: 4.236085668188633


Q3. Write a Python program that generates random samples from two normal distributions with known variances and uses an F-test to determine if the variances are equal. The program should output the F-value, degrees of freedom, and p-value for the test.

Answer:

Steps in the program:

Generate two samples from normal distributions with specified means and variances.

Compute the sample variances of the two samples.

Perform an F-test using the variance_ratio_test function to obtain the F-value, degrees of freedom, and p-value.

In [14]:
import numpy as np
from scipy.stats import f

# Function to perform the variance ratio test (F-test)
def variance_ratio_test(data1, data2):
    # Calculate variances of the two data sets
    var1 = np.var(data1, ddof=1)  # Sample variance of data1
    var2 = np.var(data2, ddof=1)  # Sample variance of data2

    # Calculate the F-value
    F_value = var1 / var2 if var1 >= var2 else var2 / var1

    # Degrees of freedom
    dfn = len(data1) - 1  # Degrees of freedom for data1
    dfd = len(data2) - 1  # Degrees of freedom for data2

    # Calculate the p-value
    p_value = 2 * min(f.cdf(F_value, dfn, dfd), 1 - f.cdf(F_value, dfn, dfd))

    return F_value, dfn, dfd, p_value

# Parameters for the normal distributions
mean1, mean2 = 50, 50
variance1, variance2 = 25, 30
size1, size2 = 30, 30

# Generate random samples from normal distributions
np.random.seed(0)  # For reproducibility
sample1 = np.random.normal(mean1, np.sqrt(variance1), size1)
sample2 = np.random.normal(mean2, np.sqrt(variance2), size2)

# Perform the F-test
F_value, dfn, dfd, p_value = variance_ratio_test(sample1, sample2)

# Output results
print("F-value:", F_value)
print("Degrees of freedom (numerator):", dfn)
print("Degrees of freedom (denominator):", dfd)
print("p-value:", p_value)

# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: Variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference in variances.")


F-value: 1.207103173271872
Degrees of freedom (numerator): 29
Degrees of freedom (denominator): 29
p-value: 0.6156036545435479
Fail to reject the null hypothesis: No significant difference in variances.


Q4.The variances of two populations are known to be 10 and 15. A sample of 12 observations is taken from each population. Conduct an F-test at the 5% significance level to determine if the variances are significantly different.

Answer:

o conduct an F-test at a 5% significance level with known variances, we can use the following procedure:

Given Information:
Population variances:
𝜎
1
2
=
10
σ
1
2
​
 =10 and
𝜎
2
2
=
15
σ
2
2
​
 =15
Sample sizes from each population:
𝑛
1
=
12
n
1
​
 =12 and
𝑛
2
=
12
n
2
​
 =12
Significance level:
𝛼
=
0.05
α=0.05

Determine degrees of freedom:

Degrees of freedom for the numerator:
df
1
=
𝑛
1
−
1
=
11
df
1
​
 =n
1
​
 −1=11
Degrees of freedom for the denominator:
df
2
=
𝑛
2
−
1
=
11
df
2
​
 =n
2
​
 −1=11
Find the critical F-value:

For a two-tailed test with
𝛼
=
0.05
α=0.05, we look up critical F-values for each tail (0.025 and 0.975) for
df
1
=
11
df
1
​
 =11 and
df
2
=
11
df
2
​
 =11.
Compare the calculated F-value to the critical F-values.

In [15]:
from scipy.stats import f

# Given data
variance1 = 10
variance2 = 15
n1 = 12
n2 = 12
alpha = 0.05

# Calculate the F-value
F_value = variance2 / variance1

# Degrees of freedom
dfn = n1 - 1  # Degrees of freedom for numerator
dfd = n2 - 1  # Degrees of freedom for denominator

# Calculate critical F-values for a two-tailed test
f_critical_low = f.ppf(alpha / 2, dfn, dfd)     # Left tail
f_critical_high = f.ppf(1 - alpha / 2, dfn, dfd)  # Right tail

# Output results
print("Calculated F-value:", F_value)
print("Degrees of freedom (numerator):", dfn)
print("Degrees of freedom (denominator):", dfd)
print("Lower critical F-value:", f_critical_low)
print("Upper critical F-value:", f_critical_high)

# Interpretation
if F_value < f_critical_low or F_value > f_critical_high:
    print("Reject the null hypothesis: Variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference in variances.")


Calculated F-value: 1.5
Degrees of freedom (numerator): 11
Degrees of freedom (denominator): 11
Lower critical F-value: 0.28787755798459863
Upper critical F-value: 3.473699051085809
Fail to reject the null hypothesis: No significant difference in variances.


Q5. A manufacturer claims that the variance of the diameter of a certain product is 0.005. A sample of 25 products is taken, and the sample variance is found to be 0.006. Conduct an F-test at the 1% significance level to determine if the claim is justified.

Answer:

To test if the sample variance significantly differs from the claimed population variance using an F-test, we’ll follow these steps:

Given Information:
Claimed variance (
𝜎
2
σ
2
 ) = 0.005
Sample variance (
𝑠
2
s
2
 ) = 0.006
Sample size (
𝑛
n) = 25
Significance level (
𝛼
α) = 0.01

Hypotheses:

Null Hypothesis
(
𝐻
0
)
(H
0
​
 ): The population variance is equal to 0.005, i.e.,
𝜎
2
=
0.005
σ
2
 =0.005.

Alternative Hypothesis
(
𝐻
1
)
(H
1
​
 ): The population variance is different from 0.005, i.e.,
𝜎
2
≠
0.005
σ
2


=0.005.
This is a two-tailed test, as we are checking if the sample variance is significantly different (either higher or lower) than the claimed variance.

In [16]:
from scipy.stats import f

# Given data
claimed_variance = 0.005
sample_variance = 0.006
n = 25
alpha = 0.01

# Calculate the F-value
F_value = sample_variance / claimed_variance

# Degrees of freedom
dfn = n - 1  # Degrees of freedom for numerator (sample)

# Calculate critical F-values for a two-tailed test
f_critical_low = f.ppf(alpha / 2, dfn, dfn)     # Left tail
f_critical_high = f.ppf(1 - alpha / 2, dfn, dfn)  # Right tail

# Output results
print("Calculated F-value:", F_value)
print("Degrees of freedom:", dfn)
print("Lower critical F-value:", f_critical_low)
print("Upper critical F-value:", f_critical_high)

# Interpretation
if F_value < f_critical_low or F_value > f_critical_high:
    print("Reject the null hypothesis: The variance is significantly different from 0.005.")
else:
    print("Fail to reject the null hypothesis: The variance is not significantly different from 0.005.")


Calculated F-value: 1.2
Degrees of freedom: 24
Lower critical F-value: 0.3370701342685674
Upper critical F-value: 2.966741631292762
Fail to reject the null hypothesis: The variance is not significantly different from 0.005.


Q6. Write a Python function that takes in the degrees of freedom for the numerator and denominator of an F-distribution and calculates the mean and variance of the distribution. The function should return the mean and variance as a tuple.

Answer:



In [17]:
def f_distribution_mean_variance(dfn, dfd):
    # Check if mean and variance can be calculated based on dfd
    mean = None
    variance = None

    # Calculate mean if dfd > 2
    if dfd > 2:
        mean = dfd / (dfd - 2)

    # Calculate variance if dfd > 4
    if dfd > 4:
        variance = (2 * (dfd ** 2) * (dfn + dfd - 2)) / (dfn * ((dfd - 2) ** 2) * (dfd - 4))

    return mean, variance


In [18]:
dfn = 5  # Degrees of freedom for the numerator
dfd = 10  # Degrees of freedom for the denominator

mean, variance = f_distribution_mean_variance(dfn, dfd)
print("Mean:", mean)
print("Variance:", variance)


Mean: 1.25
Variance: 1.3541666666666667


Q7. A random sample of 10 measurements is taken from a normal population with unknown variance. The sample variance is found to be 25. Another random sample of 15 measurements is taken from another normal population with unknown variance, and the sample variance is found to be 20. Conduct an F-test
at the 10% significance level to determine if the variances are significantly different.

Answer:

To test if the variances of two independent normal populations are significantly different, we can use an F-test for equality of variances. Here are the steps for conducting this test based on the information provided.

Given Information:
Sample 1:
𝑛
1
=
10
n
1
​
 =10, sample variance
𝑠
1
2
=
25
s
1
2
​
 =25
Sample 2:
𝑛
2
=
15
n
2
​
 =15, sample variance
𝑠
2
2
=
20
s
2
2
​
 =20
Significance level:
𝛼
=
0.10
α=0.10
Hypotheses:
Null Hypothesis
(
𝐻
0
)
(H
0
​
 ): The variances are equal, i.e.,
𝜎
1
2
=
𝜎
2
2
σ
1
2
​
 =σ
2
2
​
 .
Alternative Hypothesis
(
𝐻
1
)
(H
1
​
 ): The variances are different, i.e.,
𝜎
1
2
≠
𝜎
2
2
σ
1
2
​


=σ
2
2
​
 .
Since this is a two-tailed test, we'll split the significance level across both tails, with
𝛼
/
2
=
0.05
α/2=0.05 for each tail.


F-value: The calculated F-value indicates the ratio of the two sample variances.
Critical F-values: Define the rejection region for a two-tailed test at the 10% significance level.

Decision: If the calculated F-value is outside the range defined by the critical F-values, we reject the null hypothesis, indicating that the variances are significantly different. Otherwise, we fail to reject the null hypothesis.

This code will determine if there’s a statistically significant difference between the variances at the 10% significance level.

In [19]:
from scipy.stats import f

# Given data
s1_squared = 25
s2_squared = 20
n1 = 10
n2 = 15
alpha = 0.10

# Calculate the F-value
F_value = s1_squared / s2_squared

# Degrees of freedom
dfn = n1 - 1  # Degrees of freedom for numerator
dfd = n2 - 1  # Degrees of freedom for denominator

# Calculate critical F-values for a two-tailed test
f_critical_low = f.ppf(alpha / 2, dfn, dfd)     # Left tail
f_critical_high = f.ppf(1 - alpha / 2, dfn, dfd)  # Right tail

# Output results
print("Calculated F-value:", F_value)
print("Degrees of freedom (numerator):", dfn)
print("Degrees of freedom (denominator):", dfd)
print("Lower critical F-value:", f_critical_low)
print("Upper critical F-value:", f_critical_high)

# Interpretation
if F_value < f_critical_low or F_value > f_critical_high:
    print("Reject the null hypothesis: The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference in variances.")


Calculated F-value: 1.25
Degrees of freedom (numerator): 9
Degrees of freedom (denominator): 14
Lower critical F-value: 0.3305268601412525
Upper critical F-value: 2.6457907352338195
Fail to reject the null hypothesis: No significant difference in variances.


Q8. The following data represent the waiting times in minutes at two different restaurants on a Saturday night: Restaurant A: 24, 25, 28, 23, 22, 20, 27; Restaurant B: 31, 33, 35, 30, 32, 36. Conduct an F-test at the 5%
significance level to determine if the variances are significantly different.

Answer:

To test if the variances of the waiting times at the two restaurants are significantly different, we can perform an F-test. Here's a step-by-step solution.

Given Data:
Restaurant A:
24
,
25
,
28
,
23
,
22
,
20
,
27
24,25,28,23,22,20,27
Restaurant B:
31
,
33
,
35
,
30
,
32
,
36
31,33,35,30,32,36
Significance Level (
𝛼
α) = 0.05
Hypotheses:
Null Hypothesis
(
𝐻
0
)
(H
0
​
 ): The variances are equal, i.e.,
𝜎
𝐴
2
=
𝜎
𝐵
2
σ
A
2
​
 =σ
B
2
​
 .
Alternative Hypothesis
(
𝐻
1
)
(H
1
​
 ): The variances are different, i.e.,
𝜎
𝐴
2
≠
𝜎
𝐵
2
σ
A
2
​


=σ
B
2
​
 .
Since this is a two-tailed test, we'll split the significance level across both tails, with
𝛼
/
2
=
0.025
α/2=0.025 for each tail.

F-value: The calculated F-value is the ratio of the larger variance to the smaller variance.

Critical F-values: These define the rejection region for a two-tailed test at the 5% significance level.

Decision: If the F-value falls outside the critical range, we reject the null hypothesis, indicating that the variances are significantly different. Otherwise, we fail to reject the null hypothesis.

This code will determine if there’s a statistically significant difference between the variances in waiting times at the two restaurants at the 5% significance level.

In [20]:
import numpy as np
from scipy.stats import f

# Given data
data_A = [24, 25, 28, 23, 22, 20, 27]
data_B = [31, 33, 35, 30, 32, 36]
alpha = 0.05

# Calculate sample variances
s_A_squared = np.var(data_A, ddof=1)
s_B_squared = np.var(data_B, ddof=1)

# Determine F-value, placing the larger variance in the numerator
if s_A_squared > s_B_squared:
    F_value = s_A_squared / s_B_squared
    dfn = len(data_A) - 1  # Degrees of freedom for the numerator
    dfd = len(data_B) - 1  # Degrees of freedom for the denominator
else:
    F_value = s_B_squared / s_A_squared
    dfn = len(data_B) - 1
    dfd = len(data_A) - 1

# Calculate critical F-values for a two-tailed test
f_critical_low = f.ppf(alpha / 2, dfn, dfd)     # Left tail
f_critical_high = f.ppf(1 - alpha / 2, dfn, dfd)  # Right tail

# Output results
print("Sample variance for Restaurant A:", s_A_squared)
print("Sample variance for Restaurant B:", s_B_squared)
print("Calculated F-value:", F_value)
print("Degrees of freedom (numerator):", dfn)
print("Degrees of freedom (denominator):", dfd)
print("Lower critical F-value:", f_critical_low)
print("Upper critical F-value:", f_critical_high)

# Interpretation
if F_value < f_critical_low or F_value > f_critical_high:
    print("Reject the null hypothesis: The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference in variances.")


Sample variance for Restaurant A: 7.80952380952381
Sample variance for Restaurant B: 5.366666666666667
Calculated F-value: 1.4551907719609583
Degrees of freedom (numerator): 6
Degrees of freedom (denominator): 5
Lower critical F-value: 0.16701279718024772
Upper critical F-value: 6.977701858535566
Fail to reject the null hypothesis: No significant difference in variances.


Q9. The following data represent the test scores of two groups of students: Group A: 80, 85, 90, 92, 87, 83;
Group B: 75, 78, 82, 79, 81, 84. Conduct an F-test at the 1% significance level to determine if the variances are significantly different.

Answer:

To determine if the variances of the test scores of two groups are significantly different, we can use an F-test for equality of variances. Here’s a step-by-step solution.

Given Data:
Group A:
80
,
85
,
90
,
92
,
87
,
83
80,85,90,92,87,83
Group B:
75
,
78
,
82
,
79
,
81
,
84
75,78,82,79,81,84
Significance Level (
𝛼
α) = 0.01
Hypotheses:
Null Hypothesis
(
𝐻
0
)
(H
0
​
 ): The variances are equal, i.e.,
𝜎
𝐴
2
=
𝜎
𝐵
2
σ
A
2
​
 =σ
B
2
​
 .
Alternative Hypothesis
(
𝐻
1
)
(H
1
​
 ): The variances are different, i.e.,
𝜎
𝐴
2
≠
𝜎
𝐵
2
σ
A
2
​


=σ
B
2
​
 .
Since this is a two-tailed test, we split the significance level across both tails, with
𝛼
/
2
=
0.005
α/2=0.005 for each tail.

F-value: The calculated F-value is the ratio of the larger variance to the smaller variance.

Critical F-values: Define the rejection region for a two-tailed test at the 1% significance level.

Decision: If the F-value falls outside the critical range, we reject the null hypothesis, indicating that the variances are significantly different. Otherwise, we fail to reject the null hypothesis.

This code will output whether the variances in test scores for the two groups are significantly different at the 1% significance level.

In [21]:
import numpy as np
from scipy.stats import f

# Given data
data_A = [80, 85, 90, 92, 87, 83]
data_B = [75, 78, 82, 79, 81, 84]
alpha = 0.01

# Calculate sample variances
s_A_squared = np.var(data_A, ddof=1)
s_B_squared = np.var(data_B, ddof=1)

# Determine F-value, placing the larger variance in the numerator
if s_A_squared > s_B_squared:
    F_value = s_A_squared / s_B_squared
    dfn = len(data_A) - 1  # Degrees of freedom for the numerator
    dfd = len(data_B) - 1  # Degrees of freedom for the denominator
else:
    F_value = s_B_squared / s_A_squared
    dfn = len(data_B) - 1
    dfd = len(data_A) - 1

# Calculate critical F-values for a two-tailed test
f_critical_low = f.ppf(alpha / 2, dfn, dfd)     # Left tail
f_critical_high = f.ppf(1 - alpha / 2, dfn, dfd)  # Right tail

# Output results
print("Sample variance for Group A:", s_A_squared)
print("Sample variance for Group B:", s_B_squared)
print("Calculated F-value:", F_value)
print("Degrees of freedom (numerator):", dfn)
print("Degrees of freedom (denominator):", dfd)
print("Lower critical F-value:", f_critical_low)
print("Upper critical F-value:", f_critical_high)

# Interpretation
if F_value < f_critical_low or F_value > f_critical_high:
    print("Reject the null hypothesis: The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference in variances.")


Sample variance for Group A: 19.76666666666667
Sample variance for Group B: 10.166666666666666
Calculated F-value: 1.9442622950819677
Degrees of freedom (numerator): 5
Degrees of freedom (denominator): 5
Lower critical F-value: 0.066936171954696
Upper critical F-value: 14.939605459912224
Fail to reject the null hypothesis: No significant difference in variances.


**Thank You!**