Q1. What is hypothesis testing in statistics?

A1.
Hypothesis testing is a statistical method used to decide whether there is enough evidence in a sample to infer that a certain condition holds for the entire population.
It involves making an assumption (hypothesis), testing it with data, and deciding to accept or reject it.

Q2. What is the null hypothesis, and how does it differ from the alternative hypothesis?

A2.

Null hypothesis (H₀): Assumes there is no effect or difference.

Alternative hypothesis (H₁ or Ha): Assumes there is an effect or difference.
We test the data to see if we have enough evidence to reject H₀ in favor of H₁.

Q3. What is the significance level in hypothesis testing, and why is it important?

A3.
The significance level (α) is the probability of rejecting the null hypothesis when it is actually true (Type 1 error).
Common values: 0.05, 0.01
It sets the threshold for deciding whether a result is statistically significant.

Q4. What does a P-value represent in hypothesis testing?

A4.
A P-value is the probability of getting the observed results (or more extreme) assuming the null hypothesis is true.
It measures how compatible your data is with H₀.

Q5. How do you interpret the P-value in hypothesis testing?

A5.

If P-value ≤ α: Reject the null hypothesis (results are significant).

If P-value > α: Fail to reject the null (not enough evidence).
Smaller P-values mean stronger evidence against H₀.

Q6. What are Type 1 and Type 2 errors in hypothesis testing?

A6.

Type 1 error (α): Rejecting H₀ when it is true.

Type 2 error (β): Failing to reject H₀ when it is false.
Balancing both errors is important for good decision-making.

Q7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing?

A7.

One-tailed test: Tests for effect in one direction only (e.g., mean > 50).

Two-tailed test: Tests for effect in both directions (e.g., mean ≠ 50).
Use one-tailed when the hypothesis is directional.

Q8. What is the Z-test, and when is it used in hypothesis testing?

A8.
A Z-test is used when:

The population standard deviation is known

Sample size is large (n ≥ 30)
It compares the sample mean to the population mean using Z-scores.

Q9. How do you calculate the Z-score, and what does it represent in hypothesis testing?

A9.

𝑍
=
𝑥
ˉ
−
𝜇
𝜎
/
𝑛
Z=
σ/
n
​

x
ˉ
 −μ
​

It represents how many standard errors the sample mean is away from the population mean.
Used to determine how extreme the result is.

Q10. What is the T-distribution, and when should it be used instead of the normal distribution?

A10.
The T-distribution is similar to the normal distribution but has heavier tails.
Use it when:

Sample size is small (n < 30)

Population standard deviation is unknown

Q11. What is the difference between a Z-test and a T-test?

A11.

Z-test: Population standard deviation known, large sample

T-test: Population standard deviation unknown, small sample
T-test is more flexible when data is limited.

Q12. What is the T-test, and how is it used in hypothesis testing?

A12.
A T-test checks if there's a significant difference between:

A sample mean and a population mean (one-sample)

Two sample means (independent or paired)
Used when σ is unknown and sample size is small.

Q13. What is the relationship between Z-test and T-test in hypothesis testing?

A13.
Both tests serve the same purpose—comparing means.

Use Z-test when σ is known or n is large

Use T-test when σ is unknown and n is small
As sample size increases, T-distribution approaches Z-distribution.

Q14. What is a confidence interval, and how is it used to interpret statistical results?

A14.
A confidence interval gives a range of values likely to contain the true parameter.
For example, a 95% CI means you're 95% confident the true mean lies within that range.
It helps interpret results with more context than just a single value.

Q15. What is the margin of error, and how does it affect the confidence interval?

A15.
The margin of error is the range added and subtracted from the estimate to create a confidence interval.
Larger margin = wider interval = less precision
It depends on sample size, variability, and confidence level.

Q16. How is Bayes' Theorem used in statistics, and what is its significance?

A16.
Bayes’ Theorem calculates the probability of an event based on prior knowledge and new evidence.
Formula:

𝑃
(
𝐴
∣
𝐵
)
=
𝑃
(
𝐵
∣
𝐴
)
⋅
𝑃
(
𝐴
)
𝑃
(
𝐵
)
P(A∣B)=
P(B)
P(B∣A)⋅P(A)
​

Used in: spam detection, medical testing, AI, etc.
It allows updating probabilities as new data becomes available.

Q17. What is the Chi-square distribution, and when is it used?

A17.
The Chi-square (χ²) distribution is used for categorical data and measures how expected values differ from observed.
It’s used in:

Goodness-of-fit tests

Tests of independence in contingency tables

Q18. What is the Chi-square goodness of fit test, and how is it applied?

A18.
The goodness-of-fit test checks whether an observed frequency distribution matches an expected distribution.
Formula:

𝜒
2
=
∑
(
𝑂
−
𝐸
)
2
𝐸
χ
2
 =∑
E
(O−E)
2

​

Where O = observed, E = expected.
If χ² is large, the model doesn’t fit well.

Q19. What is the F-distribution, and when is it used in hypothesis testing?

A19.
The F-distribution is used to compare two variances.
It appears in ANOVA and F-tests.
It’s asymmetric and used when testing spread or variability between groups.

Q20. What is an ANOVA test, and what are its assumptions?

A20.
ANOVA (Analysis of Variance) tests whether there are significant differences among three or more group means.
Assumptions:

Independent samples

Normal distribution

Equal variances across groups

Q21. What are the different types of ANOVA tests?

A21.

One-way ANOVA: One independent variable (factor)

Two-way ANOVA: Two factors and their interaction

Repeated Measures ANOVA: Same subjects tested across conditions

Q22. What is the F-test, and how does it relate to hypothesis testing?

A22.
An F-test compares two variances to see if they’re significantly different.
It’s used in:

Testing assumptions before T-tests

ANOVA
A high F-value suggests significant variance differences.

#Practical Questions

Q1. Write a Python program to perform a Z-test for comparing a sample mean to a known population mean and interpret the results?

A1.

In [None]:
from scipy import stats

sample = [102, 98, 100, 105, 97, 101, 99]
pop_mean = 100
sample_mean = sum(sample) / len(sample)
std_dev = 3  # known population standard deviation
n = len(sample)

z_score = (sample_mean - pop_mean) / (std_dev / (n ** 0.5))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

print(f"Z-score: {z_score:.2f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")


Q2. Simulate random data to perform hypothesis testing and calculate the corresponding P-value using Python?

A2.

In [None]:
import numpy as np
from scipy.stats import ttest_1samp

np.random.seed(1)
data = np.random.normal(loc=52, scale=5, size=30)

t_stat, p_val = ttest_1samp(data, popmean=50)

print(f"T-statistic: {t_stat:.2f}")
print(f"P-value: {p_val:.4f}")


Q3. Implement a one-sample Z-test using Python to compare the sample mean with the population mean?

A3.

In [None]:
import numpy as np
from scipy.stats import norm

def one_sample_ztest(data, population_mean, population_std):
    sample_mean = np.mean(data)
    n = len(data)
    z = (sample_mean - population_mean) / (population_std / np.sqrt(n))
    p = 2 * (1 - norm.cdf(abs(z)))
    return z, p

sample = [51, 52, 49, 50, 53]
z, p = one_sample_ztest(sample, population_mean=50, population_std=2)
print(f"Z-score: {z:.2f}, P-value: {p:.4f}")


Q4. Perform a two-tailed Z-test using Python and visualize the decision region on a plot?

A4.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm

z_critical = norm.ppf(1 - 0.05/2)
x = np.linspace(-4, 4, 1000)
y = norm.pdf(x)

plt.plot(x, y, label='Z-distribution')
plt.fill_between(x, y, where=(x <= -z_critical) | (x >= z_critical), color='red', alpha=0.3, label='Rejection region')
plt.axvline(z_critical, color='red', linestyle='--')
plt.axvline(-z_critical, color='red', linestyle='--')
plt.title('Two-tailed Z-Test Rejection Regions')
plt.legend()
plt.grid()
plt.show()


Q5. Create a Python function that calculates and visualizes Type 1 and Type 2 errors during hypothesis testing?

A5.

In [None]:
def plot_type1_type2(mu0=50, mu1=52, sigma=2, n=30, alpha=0.05):
    import matplotlib.pyplot as plt
    import numpy as np
    from scipy.stats import norm

    se = sigma / np.sqrt(n)
    x = np.linspace(44, 56, 1000)

    dist_null = norm(mu0, se)
    dist_alt = norm(mu1, se)
    z_critical = norm.ppf(1 - alpha)
    critical_value = mu0 + z_critical * se

    plt.plot(x, dist_null.pdf(x), label="H₀", color='blue')
    plt.plot(x, dist_alt.pdf(x), label="H₁", color='green')
    plt.axvline(critical_value, color='red', linestyle='--', label='Critical Value')
    plt.fill_between(x, dist_null.pdf(x), where=x > critical_value, color='red', alpha=0.3, label='Type I Error')
    plt.fill_between(x, dist_alt.pdf(x), where=x < critical_value, color='yellow', alpha=0.3, label='Type II Error')

    plt.legend()
    plt.title("Type I and Type II Errors")
    plt.grid()
    plt.show()

plot_type1_type2()


Q6. Write a Python program to perform an independent T-test and interpret the results?

A6.

In [None]:
from scipy.stats import ttest_ind

group1 = [23, 25, 27, 30, 28]
group2 = [22, 24, 26, 23, 25]

t_stat, p_val = ttest_ind(group1, group2)

print(f"T-statistic: {t_stat:.2f}")
print(f"P-value: {p_val:.4f}")

if p_val < 0.05:
    print("Reject the null hypothesis: groups are significantly different.")
else:
    print("Fail to reject the null hypothesis.")


Q7. Perform a paired sample T-test using Python and visualize the comparison results?

A7.

In [None]:
import numpy as np
from scipy.stats import ttest_rel
import matplotlib.pyplot as plt

before = [88, 92, 85, 91, 87]
after = [90, 93, 86, 92, 88]

t_stat, p_val = ttest_rel(before, after)
print(f"Paired T-test T-stat: {t_stat:.2f}, P-value: {p_val:.4f}")

plt.plot(before, label="Before")
plt.plot(after, label="After")
plt.title("Before vs After Comparison")
plt.legend()
plt.grid()
plt.show()


Q8. Simulate data and perform both Z-test and T-test, then compare the results using Python?

A8.

In [None]:
import numpy as np
from scipy.stats import ttest_1samp, norm

np.random.seed(0)
data = np.random.normal(50, 10, 20)
pop_mean = 52
pop_std = 10

# Z-test
z_score = (np.mean(data) - pop_mean) / (pop_std / np.sqrt(len(data)))
z_pval = 2 * (1 - norm.cdf(abs(z_score)))

# T-test
t_stat, t_pval = ttest_1samp(data, pop_mean)

print(f"Z-test: Z = {z_score:.2f}, P = {z_pval:.4f}")
print(f"T-test: T = {t_stat:.2f}, P = {t_pval:.4f}")


Q9. Write a Python function to calculate the confidence interval for a sample mean and explain its significance?

A9.

In [None]:
import numpy as np
from scipy.stats import norm

def confidence_interval(data, confidence=0.95):
    n = len(data)
    mean = np.mean(data)
    std_err = np.std(data, ddof=1) / np.sqrt(n)
    z = norm.ppf(1 - (1 - confidence) / 2)
    margin = z * std_err
    return (mean - margin, mean + margin)

sample = [52, 50, 49, 51, 53]
ci = confidence_interval(sample)
print(f"95% Confidence Interval: {ci}")


Q10. Write a Python program to calculate the margin of error for a given confidence level using sample data?

A10.

In [None]:
import numpy as np
from scipy.stats import norm

data = [52, 55, 50, 54, 53, 51]
confidence = 0.95
mean = np.mean(data)
std_err = np.std(data, ddof=1) / np.sqrt(len(data))
z_score = norm.ppf(1 - (1 - confidence) / 2)

margin_error = z_score * std_err
print(f"Margin of Error: {margin_error:.2f}")


Q11. Implement a Bayesian inference method using Bayes' Theorem in Python and explain the process?

A11.

In [None]:
def bayes_theorem(prior_A, prob_B_given_A, prob_B):
    return (prob_B_given_A * prior_A) / prob_B

# Example values
prior_A = 0.01       # Disease probability
prob_B_given_A = 0.95  # Test detects disease
prob_B = 0.05        # Overall positive test rate

posterior = bayes_theorem(prior_A, prob_B_given_A, prob_B)
print(f"Posterior Probability: {posterior:.4f}")


Q12. Perform a Chi-square test for independence between two categorical variables in Python?

A12.

In [None]:
import pandas as pd
from scipy.stats import chi2_contingency

data = [[30, 10], [20, 40]]
chi2, p, dof, expected = chi2_contingency(data)

print(f"Chi-square statistic: {chi2:.2f}")
print(f"P-value: {p:.4f}")


Q13. Write a Python program to calculate the expected frequencies for a Chi-square test based on observed data?

A13.

In [None]:
from scipy.stats import chi2_contingency

observed = [[50, 30], [20, 60]]
_, _, _, expected = chi2_contingency(observed)
print("Expected Frequencies:")
print(expected)


Q14. Perform a goodness-of-fit test using Python to compare the observed data to an expected distribution?

A14.

In [None]:
from scipy.stats import chisquare

observed = [18, 22, 20, 25, 15]
expected = [20, 20, 20, 20, 20]

chi2, p = chisquare(observed, f_exp=expected)
print(f"Chi-square Statistic: {chi2:.2f}, P-value: {p:.4f}")


Q15. Create a Python script to simulate and visualize the Chi-square distribution and discuss its characteristics?

A15.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

x = np.linspace(0, 30, 1000)
df = 4
plt.plot(x, chi2.pdf(x, df), label=f'df={df}')
plt.title('Chi-square Distribution')
plt.xlabel('x')
plt.ylabel('PDF')
plt.grid()
plt.legend()
plt.show()


Q16. Implement an F-test using Python to compare the variances of two random samples?

A16.

In [None]:
import numpy as np
from scipy.stats import f

sample1 = [12, 14, 15, 13, 16]
sample2 = [10, 12, 11, 14, 13]

var1 = np.var(sample1, ddof=1)
var2 = np.var(sample2, ddof=1)

f_stat = var1 / var2
print(f"F-statistic: {f_stat:.2f}")


Q17. Write a Python program to perform an ANOVA test to compare means between multiple groups and interpret the results?

A17.

In [None]:
from scipy.stats import f_oneway

group1 = [25, 27, 29]
group2 = [20, 22, 21]
group3 = [30, 31, 29]

f_stat, p_val = f_oneway(group1, group2, group3)
print(f"F-statistic: {f_stat:.2f}, P-value: {p_val:.4f}")


Q18. Perform a one-way ANOVA test using Python to compare the means of different groups and plot the results?

A18.

In [None]:
import matplotlib.pyplot as plt

data = [group1, group2, group3]
plt.boxplot(data)
plt.title("Group Means Comparison - One-way ANOVA")
plt.xlabel("Groups")
plt.ylabel("Values")
plt.grid()
plt.show()


Q19. Write a Python function to check the assumptions (normality, independence, and equal variance) for ANOVA?

A19.

In [None]:
from scipy.stats import shapiro, levene

def check_anova_assumptions(*groups):
    # Normality
    for i, g in enumerate(groups):
        stat, p = shapiro(g)
        print(f"Group {i+1} Normality P-value: {p:.4f}")

    # Equal variances
    stat, p = levene(*groups)
    print(f"Levene’s Test for Equal Variances P-value: {p:.4f}")

check_anova_assumptions(group1, group2, group3)


Q20. Perform a two-way ANOVA test using Python to study the interaction between two factors and visualize the results?

A20.

In [None]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd

df = pd.DataFrame({
    'score': [80, 85, 88, 90, 95, 70, 75, 78, 82, 85],
    'method': ['A']*5 + ['B']*5,
    'gender': ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F']
})

model = ols('score ~ C(method) + C(gender) + C(method):C(gender)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


Q21. Write a Python program to visualize the F-distribution and discuss its use in hypothesis testing?

A21.


In [None]:
x = np.linspace(0, 5, 1000)
df1, df2 = 5, 10
plt.plot(x, f.pdf(x, df1, df2), label=f"df1={df1}, df2={df2}")
plt.title("F-distribution")
plt.xlabel("x")
plt.ylabel("Density")
plt.grid()
plt.legend()
plt.show()


Q22. Perform a one-way ANOVA test in Python and visualize the results with boxplots to compare group means?

A22.

In [None]:
from scipy.stats import f_oneway
import matplotlib.pyplot as plt

# Sample groups
group1 = [25, 27, 29]
group2 = [20, 22, 21]
group3 = [30, 31, 29]

# One-way ANOVA
f_stat, p_val = f_oneway(group1, group2, group3)
print(f"F-statistic: {f_stat:.2f}, P-value: {p_val:.4f}")


In [None]:
# Visualize the group means using boxplots
data = [group1, group2, group3]
plt.boxplot(data, labels=["Group 1", "Group 2", "Group 3"])
plt.title("Group Means Comparison - One-way ANOVA")
plt.xlabel("Groups")
plt.ylabel("Values")
plt.grid()
plt.show()


Q23. Simulate random data from a normal distribution, then perform hypothesis testing to evaluate the means?

A23.

In [None]:
data = np.random.normal(loc=50, scale=5, size=100)
t_stat, p_val = ttest_1samp(data, 52)

print(f"T-statistic: {t_stat:.2f}, P-value: {p_val:.4f}")


Q24. Perform a hypothesis test for population variance using a Chi-square distribution and interpret the results?

A24.

In [None]:
sample = [10, 12, 13, 11, 14]
n = len(sample)
sample_var = np.var(sample, ddof=1)
pop_var = 4

chi2_stat = (n - 1) * sample_var / pop_var
p_val = 1 - chi2.cdf(chi2_stat, df=n - 1)

print(f"Chi-square statistic: {chi2_stat:.2f}, P-value: {p_val:.4f}")


Q25. Write a Python script to perform a Z-test for comparing proportions between two datasets or groups?

A25.

In [None]:
from statsmodels.stats.proportion import proportions_ztest

success = [40, 30]
nobs = [100, 90]

z_stat, p_val = proportions_ztest(success, nobs)
print(f"Z-statistic: {z_stat:.2f}, P-value: {p_val:.4f}")


Q26. Implement an F-test for comparing the variances of two datasets, then interpret and visualize the results?

A26.

In [None]:
data1 = [20, 21, 19, 22, 20]
data2 = [25, 28, 24, 27, 26]

f_stat = np.var(data1, ddof=1) / np.var(data2, ddof=1)
print(f"F-statistic: {f_stat:.2f}")


Q27. Perform a Chi-square test for goodness of fit with simulated data and analyze the results?

A27.

In [None]:
observed = [18, 22, 20, 25, 15]
expected = [20, 20, 20, 20, 20]

chi2_stat, p_val = chisquare(observed, f_exp=expected)
print(f"Chi-square Stat: {chi2_stat:.2f}, P-value: {p_val:.4f}")
