1. What is hypothesis testing in statistics?
Answer: Hypothesis testing is a statistical method used to determine whether there is enough evidence in a sample of data to 
support a specific hypothesis about the population.


2. What is the null hypothesis, and how does it differ from the alternative hypothesis?
Answer: The null hypothesis (H₀) suggests no effect or relationship exists, while the alternative hypothesis
(H₁) posits that there is a significant effect or relationship.

3. What is the significance level in hypothesis testing, and why is it important?
Answer: The significance level (α) is the threshold used to decide whether to reject the null hypothesis,
typically set at 0.05. It represents the probability of making a Type I error.


4. What does a P-value represent in hypothesis testing?
Answer: The P-value represents the probability of obtaining results as extreme as those observed, assuming the null hypothesis is true.


5. How do you interpret the P-value in hypothesis testing?
Answer: If the P-value is less than or equal to the significance level (α), the null hypothesis is rejected. If it is
greater, the null hypothesis is not rejected.


6. What are Type 1 and Type 2 errors in hypothesis testing?
Answer: A Type 1 error (false positive) occurs when the null hypothesis is wrongly rejected, while a Type 2 error (false negative) occurs when the null hypothesis is wrongly not rejected.



7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing?
Answer: A one-tailed test looks for an effect in one direction (either greater or lesser), while a two-tailed test looks for an effect in both directions.

8. What is the Z-test, and when is it used in hypothesis testing?
Answer: The Z-test is used to test the mean of a population when the sample size is large (n > 30) and the population variance is known.


9. How do you calculate the Z-score, and what does it represent in hypothesis testing?
Answer: The Z-score is calculated as:

𝑍= 𝑋− 𝜇 𝜎 𝑛 Z= n σ X−μ It measures how many standard deviations the sample mean is from the population mean.


10. What is the T-distribution, and when should it be used instead of the normal distribution?
Answer: The T-distribution is used when the sample size is small (n < 30) or when the population variance is unknown. It has heavier tails than the normal distribution.



11. What is the difference between a Z-test and a T-test?
Answer: The Z-test is used for large samples with known population variance, while the T-test is used for small samples with unknown population variance.

12. What is the T-test, and how is it used in hypothesis testing?
Answer: The T-test is used to compare the means of two groups and determine if they are significantly different, particularly when the sample size is small.



13. What is the relationship between Z-test and T-test in hypothesis testing?
Answer: Both tests are used to test for differences in means, but the T-test is used when the sample size is small and the population variance is unknown, whereas the Z-test is used for large samples with known variance.



14. What is a confidence interval, and how is it used to interpret statistical results?
Answer: A confidence interval (CI) provides a range of values within which a population parameter is likely to fall, giving an estimate of the uncertainty in the sample estimate.



15. What is the margin of error, and how does it affect the confidence interval?
Answer: The margin of error defines the range above and below the sample statistic within which the true population parameter is likely to lie. A larger margin of error increases the width of the confidence interval.



16. How is Bayes' Theorem used in statistics, and what is its significance?
Answer: Bayes' Theorem calculates the probability of an event based on prior knowledge and new evidence, allowing for the updating of beliefs.



17. What is the Chi-square distribution, and when is it used?
Answer: The Chi-square distribution is used in hypothesis testing for categorical data, such as in tests of independence and goodness of fit.



18. What is the Chi-square goodness of fit test, and how is it applied?
Answer: The Chi-square goodness of fit test compares observed frequencies to expected frequencies to determine if a sample matches a population distribution.

19. What is the F-distribution, and when is it used in hypothesis testing?
Answer: The F-distribution is used to compare variances between two or more groups, often in the context of ANOVA or regression analysis.



20. What is an ANOVA test, and what are its assumptions?
Answer: The ANOVA (Analysis of Variance) test compares the means of three or more groups. Assumptions include normality of data, independence of samples, and equal variances.

21. What are the different types of ANOVA tests?
Answer: One-way ANOVA compares means between groups, Two-way ANOVA analyzes two factors and their interaction, and Repeated Measures ANOVA is used when the same subjects are tested multiple times.

22. What is the F-test, and how does it relate to hypothesis testing?
Answer: The F-test compares the variances of two populations to determine if they are significantly different, often used in ANOVA or regression analysis.



In [None]:
#Practical

In [None]:
#1. Perform a Z-test for comparing a sample mean to a known population mean

import numpy as np
from scipy.stats import norm

def z_test(sample, pop_mean, pop_std):
    n = len(sample)
    sample_mean = np.mean(sample)
    z_score = (sample_mean - pop_mean) / (pop_std / np.sqrt(n))
    p_value = 2 * (1 - norm.cdf(abs(z_score)))
    
    print(f"Z-score: {z_score:.4f}")
    print(f"P-value: {p_value:.4f}")

    if p_value < 0.05:
        print("Reject the null hypothesis (significant difference).")
    else:
        print("Fail to reject the null hypothesis (no significant difference).")

# Example usage
sample_data = np.random.normal(50, 10, 30)
z_test(sample_data, pop_mean=50, pop_std=10)


In [None]:
#2. Simulate random data and calculate the P-value

import numpy as np
from scipy.stats import norm

np.random.seed(42)
sample = np.random.normal(100, 15, 50)  # Mean=100, Std=15, Sample size=50
pop_mean = 102
pop_std = 15

z_score = (np.mean(sample) - pop_mean) / (pop_std / np.sqrt(len(sample)))
p_value = 2 * (1 - norm.cdf(abs(z_score)))

print(f"Z-score: {z_score:.4f}")
print(f"P-value: {p_value:.4f}")


In [None]:
#3. Implement a One-Sample Z-test

import numpy as np
from scipy.stats import norm

def one_sample_z_test(sample, pop_mean, pop_std):
    n = len(sample)
    sample_mean = np.mean(sample)
    z_score = (sample_mean - pop_mean) / (pop_std / np.sqrt(n))
    p_value = 2 * (1 - norm.cdf(abs(z_score)))
    
    return z_score, p_value

# Example
sample = np.random.normal(60, 5, 40)
z, p = one_sample_z_test(sample, pop_mean=58, pop_std=5)
print(f"Z-score: {z:.4f}, P-value: {p:.4f}")


In [None]:
#4. Perform a Two-Tailed Z-test and Visualize Decision Region

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

sample = np.random.normal(100, 15, 50)
pop_mean = 102
pop_std = 15

z_score = (np.mean(sample) - pop_mean) / (pop_std / np.sqrt(len(sample)))
p_value = 2 * (1 - norm.cdf(abs(z_score)))

x = np.linspace(-4, 4, 1000)
plt.plot(x, norm.pdf(x), label="Standard Normal Distribution")
plt.axvline(-1.96, color="red", linestyle="dashed", label="Critical Value (-1.96)")
plt.axvline(1.96, color="red", linestyle="dashed", label="Critical Value (1.96)")
plt.axvline(z_score, color="blue", linestyle="dotted", label=f"Z-score ({z_score:.2f})")
plt.legend()
plt.title("Two-Tailed Z-test Decision Region")
plt.show()

In [None]:
#5. Function to Calculate and Visualize Type 1 and Type 2 Errors

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def visualize_errors(alpha=0.05, beta=0.2):
    x = np.linspace(-4, 4, 1000)
    null_dist = norm.pdf(x, loc=0, scale=1)
    alt_dist = norm.pdf(x, loc=2, scale=1)
    
    plt.plot(x, null_dist, label="Null Distribution", color="blue")
    plt.plot(x, alt_dist, label="Alternative Distribution", color="green")

    critical_value = norm.ppf(1 - alpha)
    plt.fill_between(x, null_dist, where=(x > critical_value), color="red", alpha=0.3, label="Type 1 Error")
    plt.fill_between(x, alt_dist, where=(x < critical_value), color="purple", alpha=0.3, label="Type 2 Error")

    plt.legend()
    plt.title("Type 1 and Type 2 Errors")
    plt.show()

visualize_errors()


In [None]:
#6. Perform an Independent T-test

from scipy.stats import ttest_ind

group1 = np.random.normal(100, 15, 50)
group2 = np.random.normal(105, 15, 50)

t_stat, p_value = ttest_ind(group1, group2)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject the null hypothesis (significant difference).")
else:
    print("Fail to reject the null hypothesis (no significant difference).")


In [None]:
#7. Perform a Paired Sample T-test and Visualize the Comparison

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_rel

before = np.random.normal(100, 10, 30)
after = before + np.random.normal(2, 5, 30)  # Small improvement after treatment

t_stat, p_value = ttest_rel(before, after)

plt.boxplot([before, after], labels=["Before", "After"])
plt.title("Paired Sample T-test Comparison")
plt.show()

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")


In [None]:
#8. Simulate Data and Compare Z-test vs. T-test

from scipy.stats import ttest_1samp

sample = np.random.normal(100, 15, 30)
pop_mean = 102
pop_std = 15

# Z-test
z_score = (np.mean(sample) - pop_mean) / (pop_std / np.sqrt(len(sample)))
p_value_z = 2 * (1 - norm.cdf(abs(z_score)))

# T-test
t_stat, p_value_t = ttest_1samp(sample, pop_mean)

print(f"Z-test -> Z-score: {z_score:.4f}, P-value: {p_value_z:.4f}")
print(f"T-test -> T-statistic: {t_stat:.4f}, P-value: {p_value_t:.4f}")

In [None]:
#9. Function to Calculate Confidence Interval for Sample Mean

import numpy as np
from scipy.stats import t

def confidence_interval(sample, confidence=0.95):
    n = len(sample)
    mean = np.mean(sample)
    std_err = np.std(sample, ddof=1) / np.sqrt(n)
    t_crit = t.ppf((1 + confidence) / 2, df=n-1)
    
    lower_bound = mean - t_crit * std_err
    upper_bound = mean + t_crit * std_err
    
    return lower_bound, upper_bound

sample = np.random.normal(100, 15, 30)
ci_lower, ci_upper = confidence_interval(sample)

print(f"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")

In [None]:
#10. Calculate Margin of Error for a Given Confidence Level

import numpy as np
from scipy.stats import norm

def margin_of_error(sample, confidence=0.95):
    n = len(sample)
    std_dev = np.std(sample, ddof=1)
    std_err = std_dev / np.sqrt(n)
    z_crit = norm.ppf((1 + confidence) / 2)
    
    return z_crit * std_err

sample = np.random.normal(100, 15, 50)
moe = margin_of_error(sample)
print(f"Margin of Error: {moe:.4f}")


In [None]:
#11. Bayesian Inference Using Bayes' Theorem

def bayes_theorem(prior_A, likelihood_B_given_A, prior_B):
    posterior_A_given_B = (likelihood_B_given_A * prior_A) / prior_B
    return posterior_A_given_B

prior_A = 0.01  # Probability of having a disease
likelihood_B_given_A = 0.95  # Sensitivity (true positive rate)
prior_B = 0.05  # Probability of testing positive

posterior = bayes_theorem(prior_A, likelihood_B_given_A, prior_B)
print(f"Posterior Probability: {posterior:.4f}")


In [None]:
12. Perform a Chi-Square Test for Independence

import numpy as np
import scipy.stats as stats

observed = np.array([[50, 30], [20, 40]])
chi2, p, dof, expected = stats.chi2_contingency(observed)

print(f"Chi-Square Statistic: {chi2:.4f}")
print(f"P-value: {p:.4f}")
print(f"Expected Frequencies:\n{expected}")


In [None]:
#13. Calculate Expected Frequencies for Chi-Square Test


def expected_frequencies(observed):
    row_totals = observed.sum(axis=1, keepdims=True)
    col_totals = observed.sum(axis=0, keepdims=True)
    total = observed.sum()
    
    expected = (row_totals @ col_totals) / total
    return expected

observed = np.array([[50, 30], [20, 40]])
expected = expected_frequencies(observed)
print("Expected Frequencies:\n", expected)


In [None]:
#14. Perform a Goodness-of-Fit Test

observed = np.array([40, 60, 50])
expected = np.array([50, 50, 50])

chi2, p = stats.chisquare(observed, expected)
print(f"Chi-Square Statistic: {chi2:.4f}, P-value: {p:.4f}")


In [None]:
#15. Simulate and Visualize Chi-Square Distribution


import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import chi2

df = 4
x = np.linspace(0, 20, 1000)
plt.plot(x, chi2.pdf(x, df), label=f"Chi-Square (df={df})")
plt.title("Chi-Square Distribution")
plt.legend()
plt.show()


In [None]:
#16. Perform an F-Test to Compare Variances

sample1 = np.random.normal(100, 15, 50)
sample2 = np.random.normal(105, 20, 50)

f_stat = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
p_value = stats.f.cdf(f_stat, len(sample1)-1, len(sample2)-1)

print(f"F-Statistic: {f_stat:.4f}, P-value: {p_value:.4f}")


In [None]:
#17. Perform ANOVA to Compare Means


from scipy.stats import f_oneway

group1 = np.random.normal(100, 10, 30)
group2 = np.random.normal(105, 10, 30)
group3 = np.random.normal(110, 10, 30)

f_stat, p_value = f_oneway(group1, group2, group3)
print(f"F-Statistic: {f_stat:.4f}, P-value: {p_value:.4f}")


In [None]:
#18. One-Way ANOVA with Visualization

import seaborn as sns
import pandas as pd

data = {'Group': ['A']*30 + ['B']*30 + ['C']*30,
        'Score': np.concatenate([group1, group2, group3])}

df = pd.DataFrame(data)
sns.boxplot(x='Group', y='Score', data=df)
plt.show()


In [None]:
#19. Check ANOVA Assumptions


from scipy.stats import levene, shapiro

def check_anova_assumptions(*groups):
    for i, group in enumerate(groups):
        stat, p = shapiro(group)
        print(f"Group {i+1} Normality Test: P-value = {p:.4f}")

    stat, p = levene(*groups)
    print(f"Levene's Test (Equal Variances): P-value = {p:.4f}")

check_anova_assumptions(group1, group2, group3)


In [None]:
#20. Perform a Two-Way ANOVA

import statsmodels.api as sm
from statsmodels.formula.api import ols

df['Factor2'] = np.random.choice(['X', 'Y'], size=len(df))
model = ols('Score ~ C(Group) + C(Factor2) + C(Group):C(Factor2)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


In [None]:
#21. Visualize F-Distribution


from scipy.stats import f

x = np.linspace(0, 5, 1000)
plt.plot(x, f.pdf(x, 5, 10), label="df=(5,10)")
plt.title("F-Distribution")
plt.legend()
plt.show()


In [None]:
#22. One-Way ANOVA with Boxplots

sns.boxplot(x="Group", y="Score", data=df)
plt.title("One-Way ANOVA Boxplot")
plt.show()


In [None]:
#23. Simulate Data and Perform Hypothesis Testing

sample = np.random.normal(100, 15, 50)
t_stat, p_value = stats.ttest_1samp(sample, 102)
print(f"T-statistic: {t_stat:.4f}, P-value: {p_value:.4f}")


In [None]:
#24. Hypothesis Test for Population Variance Using Chi-Square


sample = np.random.normal(100, 15, 50)
sample_var = np.var(sample, ddof=1)
pop_var = 225  # Population variance

chi2_stat = (len(sample) - 1) * sample_var / pop_var
p_value = 1 - chi2.cdf(chi2_stat, len(sample) - 1)

print(f"Chi-Square Statistic: {chi2_stat:.4f}, P-value: {p_value:.4f}")


In [None]:
#25. Perform a 2-Proportion Z-Test

from statsmodels.stats.proportion import proportions_ztest

success = np.array([50, 30])
n = np.array([100, 80])

z_stat, p_value = proportions_ztest(success, n)
print(f"Z-Statistic: {z_stat:.4f}, P-value: {p_value:.4f}")


In [None]:
#26. F-Test for Comparing Variances

f_stat = np.var(group1, ddof=1) / np.var(group2, ddof=1)
p_value = stats.f.cdf(f_stat, len(group1)-1, len(group2)-1)

print(f"F-Statistic: {f_stat:.4f}, P-value: {p_value:.4f}")
