1. What is hypothesis testing in statistics?

Hypothesis testing is a statistical method used to determine whether there is enough evidence in a sample to infer a conclusion about a population.

2.What is the null hypothesis, and how does it differ from the alternative hypothesis?


The null hypothesis (H₀) states that there is no effect or difference (e.g., "There is no difference between the means of two groups").
The alternative hypothesis (H₁ or Ha) states that there is an effect or difference (e.g., "There is a significant difference between the means of two groups").

3.What is the significance level in hypothesis testing, and why is it important?

The significance level (α) is the probability of rejecting the null hypothesis when it is actually true. It is important because it helps control the likelihood of making a Type I error (false positive). A common value for α is 0.05 (5%).

4.What does a P-value represent in hypothesis testing?

The P-value measures the probability of obtaining results at least as extreme as the observed data, assuming the null hypothesis is true.

5.How do you interpret the P-value in hypothesis testing?


If P-value ≤ α: Reject the null hypothesis (strong evidence against H₀).
If P-value > α: Fail to reject the null hypothesis (insufficient evidence to conclude H₁ is true).

6.What are Type 1 and Type 2 errors in hypothesis testing?


Type I error (False Positive): Rejecting H₀ when it is actually true.
Type II error (False Negative): Failing to reject H₀ when H₁ is actually true.


7.What is the difference between a one-tailed and a two-tailed test in hypothesis testing?


One-tailed test: Tests if a parameter is significantly greater or less than a certain value.
Two-tailed test: Tests for any significant difference in either direction.


8.What is the Z-test, and when is it used in hypothesis testing?

The Z-test is used when comparing population means, assuming a normal distribution and known population variance.

9.How do you calculate the Z-score, and what does it represent in hypothesis testing?

𝑍
=
(
𝑋
−
𝜇
)
𝜎
Z= 
σ
(X−μ)
​
 
It represents the number of standard deviations a data point is from the mean.

10.What is the T-distribution, and when should it be used instead of the normal distribution?

The T-distribution is used when the sample size is small (n < 30) and/or the population variance is unknown.

11.What is the difference between a Z-test and a T-test?

Z-test: Used when population variance is known and sample size is large.
T-test: Used when population variance is unknown and sample size is small.

12.What is the T-test, and how is it used in hypothesis testing?

The T-test is used to compare the means of two groups to determine if they are significantly different.

13.What is the relationship between Z-test and T-test in hypothesis testing?

The T-test is a generalized version of the Z-test, used when the population variance is unknown.

14.What is a confidence interval, and how is it used to interpret statistical results?

A confidence interval gives a range of values within which a population parameter is expected to lie, with a certain level of confidence (e.g., 95%).

15.What is the margin of error, and how does it affect the confidence interval?

The margin of error represents the range within which the true population parameter is expected to fall. A larger margin means less precision.

16. How is Bayes' Theorem used in statistics, and what is its significance?

Bayes' Theorem is used to update probabilities based on new evidence. It is significant in Bayesian inference and decision-making.

17.What is the Chi-square distribution, and when is it used?

The Chi-square distribution is used for categorical data analysis, such as goodness-of-fit and independence tests.

18.What is the Chi-square goodness of fit test, and how is it applied?

It tests whether an observed distribution fits an expected distribution.

19.What is the F-distribution, and when is it used in hypothesis testing?

The F-distribution is used in analysis of variance (ANOVA) and comparing variances.

20.What is an ANOVA test, and what are its assumptions?

ANOVA (Analysis of Variance) is used to compare the means of three or more groups. Assumptions include normality, independence, and equal variances.

21.What are the different types of ANOVA tests?


One-way ANOVA (single factor)
Two-way ANOVA (two factors)
Repeated measures ANOVA

22.What is the F-test, and how does it relate to hypothesis testing?

The F-test is used to compare variances or test multiple group means (ANOVA). It determines if groups have significantly different variances.

In [None]:

#1.Perform a Z-test for comparing a sample mean to a known population mean

import numpy as np
from scipy import stats

def z_test(sample, population_mean, population_std):
    sample_mean = np.mean(sample)
    sample_size = len(sample)
    z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))  # Two-tailed test
    
    print(f"Z-score: {z_score:.4f}")
    print(f"P-value: {p_value:.4f}")

    if p_value < 0.05:
        print("Reject the null hypothesis (significant difference).")
    else:
        print("Fail to reject the null hypothesis (no significant difference).")

# Example usage
np.random.seed(42)
sample_data = np.random.normal(100, 15, 30)  # Sample data
z_test(sample_data, population_mean=105, population_std=15)


#2. Simulate random data for hypothesis testing and calculate the P-value


import numpy as np
from scipy.stats import ttest_1samp

# Simulating data
np.random.seed(42)
sample_data = np.random.normal(loc=100, scale=15, size=30)

# Performing a one-sample T-test
t_stat, p_value = ttest_1samp(sample_data, popmean=105)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")



#3. Implement a one-sample Z-test


from statsmodels.stats.weightstats import ztest

def one_sample_z_test(sample, population_mean):
    z_stat, p_value = ztest(sample, value=population_mean)
    print(f"Z-statistic: {z_stat:.4f}")
    print(f"P-value: {p_value:.4f}")
    
    if p_value < 0.05:
        print("Reject the null hypothesis.")
    else:
        print("Fail to reject the null hypothesis.")

# Example usage
np.random.seed(42)
sample = np.random.normal(100, 15, 30)
one_sample_z_test(sample, 105)



#4. Perform a two-tailed Z-test and visualize decision region



import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm

# Z-test function
def two_tailed_z_test(sample, pop_mean, pop_std):
    sample_mean = np.mean(sample)
    sample_size = len(sample)
    z_score = (sample_mean - pop_mean) / (pop_std / np.sqrt(sample_size))
    p_value = 2 * (1 - norm.cdf(abs(z_score)))

    # Plot decision regions
    x = np.linspace(-4, 4, 1000)
    y = norm.pdf(x)

    plt.figure(figsize=(8, 5))
    plt.plot(x, y, label="Standard Normal Distribution")
    plt.axvline(-1.96, color='r', linestyle='dashed', label="Critical value (-1.96)")
    plt.axvline(1.96, color='r', linestyle='dashed', label="Critical value (1.96)")
    plt.fill_between(x, y, where=(x < -1.96) | (x > 1.96), color='red', alpha=0.3)
    plt.axvline(z_score, color='blue', linestyle='solid', label=f"Z-score ({z_score:.2f})")
    plt.legend()
    plt.title("Two-Tailed Z-Test Decision Region")
    plt.show()

    print(f"Z-score: {z_score:.4f}, P-value: {p_value:.4f}")
    if p_value < 0.05:
        print("Reject the null hypothesis.")
    else:
        print("Fail to reject the null hypothesis.")

# Example usage
np.random.seed(42)
sample_data = np.random.normal(100, 15, 30)
two_tailed_z_test(sample_data, 105, 15)



#5. Visualizing Type 1 and Type 2 errors



def plot_type1_type2():
    x = np.linspace(-4, 4, 1000)
    y = norm.pdf(x)

    plt.figure(figsize=(8, 5))
    plt.plot(x, y, label="Standard Normal Distribution")
    plt.fill_between(x, y, where=(x < -1.96) | (x > 1.96), color='red', alpha=0.3, label="Type I Error (α)")
    plt.fill_between(x, y, where=(-1.28 < x) & (x < 1.28), color='blue', alpha=0.3, label="Type II Error (β)")
    
    plt.axvline(-1.96, color='r', linestyle='dashed')
    plt.axvline(1.96, color='r', linestyle='dashed')
    plt.legend()
    plt.title("Type I and Type II Errors in Hypothesis Testing")
    plt.show()

plot_type1_type2()



#6. Perform an independent T-test


from scipy.stats import ttest_ind

np.random.seed(42)
group1 = np.random.normal(100, 15, 30)
group2 = np.random.normal(105, 15, 30)

t_stat, p_value = ttest_ind(group1, group2)
print(f"T-statistic: {t_stat:.4f}, P-value: {p_value:.4f}")



#7. Perform a paired T-test



from scipy.stats import ttest_rel

# Simulating before and after treatment data
np.random.seed(42)
before = np.random.normal(100, 15, 30)
after = before + np.random.normal(2, 5, 30)

t_stat, p_value = ttest_rel(before, after)
print(f"T-statistic: {t_stat:.4f}, P-value: {p_value:.4f}")



#8. Simulate data and compare Z-test and T-test results


np.random.seed(42)
sample = np.random.normal(100, 15, 30)

z_stat, z_p = ztest(sample, value=105)
t_stat, t_p = ttest_1samp(sample, 105)

print(f"Z-test: Z-stat={z_stat:.4f}, P-value={z_p:.4f}")
print(f"T-test: T-stat={t_stat:.4f}, P-value={t_p:.4f}")


#9. Calculate and interpret a confidence interval


def confidence_interval(sample, confidence=0.95):
    sample_mean = np.mean(sample)
    sample_std = np.std(sample, ddof=1)
    n = len(sample)
    margin_error = stats.t.ppf((1 + confidence) / 2, df=n-1) * (sample_std / np.sqrt(n))

    lower_bound = sample_mean - margin_error
    upper_bound = sample_mean + margin_error

    print(f"{confidence*100}% Confidence Interval: ({lower_bound:.2f}, {upper_bound:.2f})")
    return lower_bound, upper_bound

# Example usage
np.random.seed(42)
sample = np.random.normal(100, 15, 30)
confidence_interval(sample)






#10. Calculate the Margin of Error for a Given Confidence Level




import numpy as np
from scipy.stats import t

def margin_of_error(sample, confidence=0.95):
    sample_mean = np.mean(sample)
    sample_std = np.std(sample, ddof=1)
    n = len(sample)
    t_critical = t.ppf((1 + confidence) / 2, df=n-1)  # t-critical value
    margin_error = t_critical * (sample_std / np.sqrt(n))

    print(f"Margin of Error: {margin_error:.4f}")
    return margin_error

# Example usage
np.random.seed(42)
sample_data = np.random.normal(100, 15, 30)
margin_of_error(sample_data)



#11. Bayesian Inference using Bayes' Theorem



def bayes_theorem(prior, likelihood, evidence):
    posterior = (likelihood * prior) / evidence
    return posterior

# Example: Disease Testing
prior = 0.01  # Probability of having a disease
likelihood = 0.95  # Sensitivity of test (true positive rate)
false_positive_rate = 0.05  # Probability of false positive
evidence = (likelihood * prior) + (false_positive_rate * (1 - prior))

posterior = bayes_theorem(prior, likelihood, evidence)
print(f"Posterior probability of having the disease given a positive test: {posterior:.4f}")



#12. Chi-Square Test for Independence


import pandas as pd
import scipy.stats as stats

# Simulated contingency table (2x2)
data = np.array([[50, 30], [20, 40]])
chi2_stat, p, dof, expected = stats.chi2_contingency(data)

print(f"Chi-Square Statistic: {chi2_stat:.4f}")
print(f"P-value: {p:.4f}")
print(f"Degrees of Freedom: {dof}")
print("Expected Frequencies:")
print(expected)



#13. Expected Frequencies for Chi-Square Test


def expected_frequencies(observed):
    _, _, _, expected = stats.chi2_contingency(observed)
    return expected

observed_data = np.array([[50, 30], [20, 40]])
expected_freqs = expected_frequencies(observed_data)
print("Expected Frequencies:\n", expected_freqs)



#14. Goodness-of-Fit Test

observed = np.array([50, 30, 20])
expected = np.array([40, 40, 20])
chi2_stat, p = stats.chisquare(observed, expected)

print(f"Chi-Square Statistic: {chi2_stat:.4f}")
print(f"P-value: {p:.4f}")



#15. Simulate and Visualize the Chi-Square Distribution


import matplotlib.pyplot as plt
import scipy.stats as stats

df = 4  # Degrees of freedom
x = np.linspace(0, 20, 100)
y = stats.chi2.pdf(x, df)

plt.plot(x, y, label=f"Chi-Square Distribution (df={df})")
plt.fill_between(x, y, alpha=0.3)
plt.title("Chi-Square Distribution")
plt.xlabel("Value")
plt.ylabel("Probability Density")
plt.legend()
plt.show()



#16. F-Test to Compare Variances



def f_test(sample1, sample2):
    var1, var2 = np.var(sample1, ddof=1), np.var(sample2, ddof=1)
    f_stat = var1 / var2
    df1, df2 = len(sample1)-1, len(sample2)-1
    p_value = 1 - stats.f.cdf(f_stat, df1, df2)

    print(f"F-Statistic: {f_stat:.4f}")
    print(f"P-value: {p_value:.4f}")

np.random.seed(42)
data1 = np.random.normal(100, 15, 30)
data2 = np.random.normal(105, 10, 30)
f_test(data1, data2)



#17. One-Way ANOVA


from scipy.stats import f_oneway

np.random.seed(42)
group1 = np.random.normal(100, 15, 30)
group2 = np.random.normal(105, 15, 30)
group3 = np.random.normal(110, 15, 30)

f_stat, p_value = f_oneway(group1, group2, group3)
print(f"F-Statistic: {f_stat:.4f}, P-value: {p_value:.4f}")



#18. Check ANOVA Assumptions


import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols

def check_anova_assumptions(data):
    # Normality check
    sm.qqplot(data, line='s')
    plt.title("Q-Q Plot for Normality Check")
    plt.show()

    # Histogram
    sns.histplot(data, kde=True)
    plt.title("Histogram for Normality Check")
    plt.show()

    # Variance check
    print("Variance:", np.var(data, ddof=1))

np.random.seed(42)
sample_data = np.random.normal(100, 15, 30)
check_anova_assumptions(sample_data)



#19. Two-Way ANOVA



import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Simulated dataset
df = pd.DataFrame({
    "Factor1": np.repeat(["A", "B"], 30),
    "Factor2": np.tile(["X", "Y"], 30),
    "Values": np.random.normal(100, 15, 60)
})

# Two-way ANOVA
model = ols('Values ~ C(Factor1) + C(Factor2) + C(Factor1):C(Factor2)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)



#20. Visualizing F-Distribution



df1, df2 = 10, 20
x = np.linspace(0, 5, 100)
y = stats.f.pdf(x, df1, df2)

plt.plot(x, y, label=f"F-distribution (df1={df1}, df2={df2})")
plt.fill_between(x, y, alpha=0.3)
plt.title("F-Distribution")
plt.xlabel("F-value")
plt.ylabel("Probability Density")
plt.legend()
plt.show()



#21. Z-Test for Proportions



from statsmodels.stats.proportion import proportions_ztest

successes = np.array([40, 50])
samples = np.array([100, 120])

z_stat, p_value = proportions_ztest(successes, samples)
print(f"Z-Statistic: {z_stat:.4f}, P-value: {p_value:.4f}")



#22. Chi-Square Goodness-of-Fit Test with Simulated Data


observed = np.random.randint(20, 50, size=5)
expected = np.full(5, np.mean(observed))
chi2_stat, p_value = stats.chisquare(observed, expected)

print(f"Chi-Square Statistic: {chi2_stat:.4f}, P-value: {p_value:.4f}")
