Q1: What is hypothesis testing in statistics?

**Answer -** Hypothesis testing is a way in statistics to make decisions based on sample data. We usually start with an assumption, called a hypothesis, and then collect data to test if it is true or not. This process helps us to guess about a population based on small sample. It's like checking if what we belive is actually true or just by chance.

Q2: What is the null hypothesis, and how does it differ from the alternative hypothesis?

**Answer -** The null hypothesis (Ho) is like the starting point. It says there is no effect or no change. On the other hand, the alternative hypothesis (H1 or Ho) is what we think might be true instead. So, null is like “nothing is happening” and alternative is “something is happening”. We test the data to decide if we can reject the null.

Q3: What is the significance level in hypothesis testing, and why is it important?

**Answer -** Significance level (usually denoted by alpha, like 0.05) tells us how much risk we are willing to take to reject the null hypothesis when it is actually true. It's a threshold value. If our p-value is less than alpha, we reject the null. It's important because it controls the chance of making mistake in decision.

Q4: What does a P-value represent in hypothesis testing?

**Answer -** P-value is the probability of getting a result as extreme as our data, assuming the null hypothesis is true. It shows how likely our sample result is if the null is right. Smaller p-value means more strong evidence against the null.

Q5: How do you interpret the P-value in hypothesis testing?

**Answer -** If the p-value is smaller than significance level (like 0.05), we reject the null hypothesis. It means the result is statistically significant. If p-value is big, we don't have enough proof to reject the null.

Q6: What are Type 1 and Type 2 errors in hypothesis testing?

**Answer -** Type 1 error happens when we reject the null hypothesis even though it's actually true. Type 2 error is when we don't reject the null even though the alternative is true. So basically, Type 1 is false positive and Type 2 is false negative.

Q7: What is the difference between a one-tailed and a two-tailed test in hypothesis testing?

**Answer -** In one-tailed test, we check only in one direction (either greater or less). In two-tailed, we check both directions (greater or less than). Two-tailed is more strict because we divide the alpha into two sides of the distribution.

Q8: What is the Z-test, and when is it used in hypothesis testing?

**Answer -** Z-test is used when the population standard deviation is known and sample size is large (usually more than 30). It helps to test mean or proportion when data is normally distributed. It uses Z-score to check how far the sample mean is from population mean.

Q9: How do you calculate the Z-score, and what does it represent in hypothesis testing?

**Answer -** Z-score = (sample mean - population mean) / (standard deviation / √n). It shows how many standard deviations our sample is away from the population mean. If Z-score is too high or low, the sample is unlikely under the null hypothesis.

Q10: What is the T-distribution, and when should it be used instead of the normal distribution?

**Answer -** T-distribution is similar to normal but it has fatter tails. It is used when sample size is small (less than 30) or population standard deviation is unknown. It gives better results in those conditions.

Q11: What is the difference between a Z-test and a T-test?

**Answer -** Z-test is for large samples and known standard deviation. T-test is for small samples or unknown standard deviation. Also, T-test uses t-distribution while Z-test uses normal distribution.

Q12: What is the T-test, and how is it used in hypothesis testing?

**Answer -** T-test is a statistical test that helps to compare means. It is used to see if the average of two groups are different or not. We use it when sample is small and standard deviation is unknown. There are different types like one-sample, two-sample, and paired t-test.

Q13: What is the relationship between Z-test and T-test in hypothesis testing?

**Answer -** Both are used for comparing means, but based on different conditions. When sample is big and standard deviation is known, we use Z-test. When sample is small or we don’t know standard deviation, we use T-test.

Q14: What is a confidence interval, and how is it used to interpret statistical results?

**Answer -** onfidence interval gives a range of values where we believe the true value lies. For example, 95% confidence interval means we are 95% sure the actual value is inside that range. It helps to understand the uncertainty in estimate.

Q15: What is the margin of error, and how does it affect the confidence interval?

**Answer -** Margin of error is how much we allow the estimate to vary. Bigger margin means wider confidence interval, smaller margin means more precise result. It depends on sample size and standard deviation.

Q16: How is Bayes' Theorem used in statistics, and what is its significance?
**Answer -** Bayes' Theorem helps us to update our beliefs based on new data. It is very useful in probability and decision-making. It combines prior knowledge with new evidence to calculate updated probabilities.

Q17: What is the Chi-square distribution, and when is it used?

**Answer -** Chi-square distribution is used for testing relationships between categorical variables. It is right-skewed and only positive. Mostly used in tests like goodness of fit or independence test.

Q18: What is the Chi-square goodness of fit test, and how is it applied?

**Answer -** Goodness of fit test checks if the observed data match the expected distribution. For example, if we expect equal number of red, blue, green balls, we can use this test to see if real counts match that.

Q19: What is the F-distribution, and when is it used in hypothesis testing?

**Answer -** F-distribution is used for comparing variances between groups. It is used in ANOVA and regression models. It is also right-skewed like chi-square.

Q20: What is an ANOVA test, and what are its assumptions?

**Answer -** ANOVA (Analysis of Variance) checks if more than two group means are different. It assumes that samples are independent, normal distributed, and have equal variances. If ANOVA shows significance, it means at least one group is different.

Q21: What are the different types of ANOVA tests?

**Answer -** There are mainly three types: One-way ANOVA (for one factor), Two-way ANOVA (two factors), and Repeated Measures ANOVA (same subject tested multiple times). Each has different uses based on the design.

Q22: What is the F-test, and how does it relate to hypothesis testing?

**Answer -** F-test is used to compare two variances. It is the base for ANOVA test. If F value is big and p-value is low, we reject null and say there is a significant difference between group variances.



1. Write a Python program to perform a Z-test for comparing a sample mean to a known population mean and
interpret the results@

In [None]:
from scipy import stats
import numpy as np

# sample data
sample = [12, 15, 14, 10, 13, 16, 14, 12, 13, 15]
sample_mean = np.mean(sample)
population_mean = 13
population_std = 2  # known population standard deviation
n = len(sample)

# Z-test
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(n))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

print("Z-score:", z_score)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject null hypothesis. Sample mean is significantly different.")
else:
    print("Fail to reject null hypothesis. No significant difference found.")


2. Simulate random data to perform hypothesis testing and calculate the corresponding P-value using Python@


In [None]:
import numpy as np
from scipy import stats

np.random.seed(0)

# simulate 100 values from normal distribution
data = np.random.normal(loc=50, scale=5, size=100)

# population mean to test against
pop_mean = 52

# one-sample t-test
t_stat, p_val = stats.ttest_1samp(data, pop_mean)

print("T-statistic:", t_stat)
print("P-value:", p_val)

if p_val < 0.05:
    print("Reject null, the sample mean is significantly different.")
else:
    print("Fail to reject null, no significant difference found.")


3.Implement a one-sample Z-test using Python to compare the sample mean with the population mean@


In [None]:
import numpy as np
from scipy.stats import norm

# given sample
data = np.array([25, 27, 24, 26, 28, 29, 30])
sample_mean = np.mean(data)
pop_mean = 26
pop_std = 2  # known

n = len(data)
z = (sample_mean - pop_mean) / (pop_std / np.sqrt(n))
p = 2 * (1 - norm.cdf(abs(z)))

print("Z-Statistic:", z)
print("P-Value:", p)

if p < 0.05:
    print("Sample is significantly different from population mean.")
else:
    print("No significant difference from population mean.")


4. Perform a two-tailed Z-test using Python and visualize the decision region on a plot@


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# sample data
sample = [55, 52, 53, 54, 56, 58, 52, 54, 57, 53]
sample_mean = np.mean(sample)
pop_mean = 54
pop_std = 2
n = len(sample)

# Z-test
z = (sample_mean - pop_mean) / (pop_std / np.sqrt(n))
p_value = 2 * (1 - norm.cdf(abs(z)))

print("Z-score:", z)
print("P-value:", p_value)

# Visualize
x = np.linspace(-4, 4, 1000)
y = norm.pdf(x)

plt.plot(x, y, label='Normal Distribution')
plt.axvline(z, color='red', linestyle='--', label=f'Z-score = {z:.2f}')
plt.axvline(-1.96, color='green', linestyle=':', label='Critical Z = ±1.96')
plt.axvline(1.96, color='green', linestyle=':')

plt.title('Two-tailed Z-test Decision Region')
plt.legend()
plt.grid(True)
plt.show()


5. Perform a two-tailed Z-test using Python and visualize the decision region on a plot@


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def plot_type_errors(mu0, mu1, sigma, alpha=0.05):
    z_critical = norm.ppf(1 - alpha/2)
    x = np.linspace(mu0 - 4*sigma, mu1 + 4*sigma, 1000)

    y_null = norm.pdf(x, mu0, sigma)
    y_alt = norm.pdf(x, mu1, sigma)

    plt.plot(x, y_null, label='Null Hypothesis (H0)', color='blue')
    plt.plot(x, y_alt, label='Alternative Hypothesis (H1)', color='orange')

    critical_left = mu0 - z_critical * sigma
    critical_right = mu0 + z_critical * sigma

    plt.axvline(critical_left, color='red', linestyle='--', label='Critical Region')
    plt.axvline(critical_right, color='red', linestyle='--')

    plt.fill_between(x, 0, y_null, where=(x < critical_left) | (x > critical_right), color='blue', alpha=0.3, label='Type I Error')
    plt.fill_between(x, 0, y_alt, where=(x >= critical_left) & (x <= critical_right), color='orange', alpha=0.3, label='Type II Error')

    plt.title('Type I and Type II Errors')
    plt.legend()
    plt.grid(True)
    plt.show()

plot_type_errors(mu0=50, mu1=52, sigma=3)


6. Write a Python program to perform an independent T-test and interpret the results@


In [None]:
from scipy.stats import ttest_ind
import numpy as np

# two independent samples
group1 = [22, 25, 27, 24, 28, 26, 23]
group2 = [30, 32, 28, 29, 33, 31, 30]

# perform t-test
t_stat, p_value = ttest_ind(group1, group2)

print("T-statistic:", t_stat)
print("P-value:", p_value)

if p_value < 0.05:
    print("There is significant difference between the two groups.")
else:
    print("There is no significant difference between the groups.")


7. Perform a paired sample T-test using Python and visualize the comparison results@


In [None]:
from scipy.stats import ttest_rel
import matplotlib.pyplot as plt

before = [80, 85, 78, 90, 87]
after = [83, 88, 79, 91, 89]

# Paired t-test
t_stat, p_val = ttest_rel(before, after)

print("T-statistic:", t_stat)
print("P-value:", p_val)

# Visualizing change
plt.plot(before, label="Before", marker='o')
plt.plot(after, label="After", marker='x')
plt.title("Before vs After - Paired T-test")
plt.legend()
plt.grid(True)
plt.show()


8. Perform a paired sample T-test using Python and visualize the comparison results@


In [None]:
import numpy as np
from scipy.stats import ttest_1samp, norm

np.random.seed(1)
data = np.random.normal(loc=100, scale=15, size=25)

# Known population std
pop_mean = 105
pop_std = 15

# Z-test
z = (np.mean(data) - pop_mean) / (pop_std / np.sqrt(len(data)))
z_p = 2 * (1 - norm.cdf(abs(z)))

# T-test
t_stat, t_p = ttest_1samp(data, pop_mean)

print("Z-test -> Z:", z, "P-value:", z_p)
print("T-test -> T:", t_stat, "P-value:", t_p)


9. Write a Python function to calculate the confidence interval for a sample mean and explain its significance.


In [None]:
import numpy as np
from scipy.stats import t

def confidence_interval(data, confidence=0.95):
    mean = np.mean(data)
    n = len(data)
    se = np.std(data, ddof=1) / np.sqrt(n)
    t_crit = t.ppf((1 + confidence) / 2, df=n-1)
    margin = t_crit * se
    return mean - margin, mean + margin

# Example
data = [45, 50, 47, 49, 48, 46]
ci = confidence_interval(data)
print("Confidence Interval:", ci)


10. Write a Python program to calculate the margin of error for a given confidence level using sample dataD


In [None]:
import numpy as np
from scipy.stats import t

def margin_of_error(data, confidence=0.95):
    n = len(data)
    se = np.std(data, ddof=1) / np.sqrt(n)
    t_val = t.ppf((1 + confidence) / 2, n - 1)
    return t_val * se

# Sample data
sample = [200, 210, 205, 215, 220]
moe = margin_of_error(sample)
print("Margin of Error:", moe)


11. Implement a Bayesian inference method using Bayes' Theorem in Python and explain the processD


In [None]:
def bayes_theorem(prior_A, prob_B_given_A, prob_B):
    return (prob_B_given_A * prior_A) / prob_B

# Example:
# P(Disease) = 0.01
# P(Positive | Disease) = 0.99
# P(Positive) = 0.05

posterior = bayes_theorem(0.01, 0.99, 0.05)
print("Posterior Probability (Disease | Positive Test):", posterior)


12. Perform a Chi-square test for independence between two categorical variables in PythonD


In [None]:
import pandas as pd
from scipy.stats import chi2_contingency

# Example: Gender vs Preference
data = pd.DataFrame({
    'Likes': [30, 20],
    'Dislikes': [10, 40]
}, index=['Male', 'Female'])

chi2, p, dof, expected = chi2_contingency(data)

print("Chi-square Statistic:", chi2)
print("P-value:", p)
print("Expected Frequencies:\n", expected)


13. Write a Python program to calculate the expected frequencies for a Chi-square test based on observed
dataD

In [None]:
import pandas as pd
from scipy.stats import chi2_contingency

# Observed data
data = pd.DataFrame({
    'Yes': [25, 30],
    'No': [15, 20]
}, index=['Group A', 'Group B'])

# Calculate expected freq
chi2, p, dof, expected = chi2_contingency(data)

print("Expected Frequencies:\n", expected)


14. Perform a goodness-of-fit test using Python to compare the observed data to an expected distribution


In [None]:
from scipy.stats import chisquare

# observed vs expected
observed = [18, 22, 20, 25, 15]
expected = [20, 20, 20, 20, 20]  # uniform expected dist

chi2_stat, p_val = chisquare(f_obs=observed, f_exp=expected)

print("Chi-square Statistic:", chi2_stat)
print("P-value:", p_val)


15 Create a Python script to simulate and visualize the Chi-square distribution and discuss its characteristics.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

df = 4  # degrees of freedom
x = np.linspace(0, 20, 500)
y = chi2.pdf(x, df)

plt.plot(x, y, label=f'df = {df}')
plt.title("Chi-square Distribution")
plt.xlabel("X")
plt.ylabel("Probability Density")
plt.grid(True)
plt.legend()
plt.show()


16. Implement an F-test using Python to compare the variances of two random samplesD


In [None]:
import numpy as np
from scipy.stats import f

group1 = [21, 23, 20, 22, 24]
group2 = [30, 35, 32, 33, 34]

var1 = np.var(group1, ddof=1)
var2 = np.var(group2, ddof=1)

f_stat = var1 / var2 if var1 > var2 else var2 / var1
d1 = len(group1) - 1
d2 = len(group2) - 1

p_val = 1 - f.cdf(f_stat, d1, d2)

print("F-statistic:", f_stat)
print("P-value:", p_val)


17. Write a Python program to perform an ANOVA test to compare means between multiple groups and interpret the resultsD.

In [None]:
from scipy.stats import f_oneway

group1 = [88, 90, 85, 87]
group2 = [78, 76, 75, 79]
group3 = [92, 94, 91, 93]

f_stat, p_val = f_oneway(group1, group2, group3)

print("F-statistic:", f_stat)
print("P-value:", p_val)


18.  Perform a one-way ANOVA test using Python to compare the means of different groups and plot the results.


In [None]:
import matplotlib.pyplot as plt

# reuse groups from last example
groups = [group1, group2, group3]

# boxplot
plt.boxplot(groups, labels=["Group 1", "Group 2", "Group 3"])
plt.title("Group Comparison using One-way ANOVA")
plt.ylabel("Scores")
plt.grid(True)
plt.show()


19. Write a Python function to check the assumptions (normality, independence, and equal variance) for ANOVA.


In [None]:
import scipy.stats as stats
import numpy as np

def check_anova_assumptions(*groups):
    # Normality test
    for i, g in enumerate(groups):
        stat, p = stats.shapiro(g)
        print(f"Group {i+1} normality p-value:", p)

    # Equal variance test
    stat, p = stats.levene(*groups)
    print("Levene’s test for equal variances p-value:", p)

# Example groups
g1 = [14, 15, 16, 13, 17]
g2 = [22, 20, 21, 23, 19]
g3 = [30, 29, 28, 31, 27]

check_anova_assumptions(g1, g2, g3)


20. Perform a two-way ANOVA test using Python to study the interaction between two factors and visualize the results.

In [None]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd

# sample data
data = pd.DataFrame({
    'Score': [80, 85, 78, 90, 88, 75, 70, 72, 68, 71, 60, 63],
    'Method': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B'],
    'Gender': ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'M', 'F', 'F', 'M', 'F']
})

model = ols('Score ~ C(Method) + C(Gender) + C(Method):C(Gender)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


21. Write a Python program to visualize the F-distribution and discuss its use in hypothesis testing

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f

x = np.linspace(0, 5, 500)
df1, df2 = 5, 10
y = f.pdf(x, df1, df2)

plt.plot(x, y, label=f"df1={df1}, df2={df2}")
plt.title("F-distribution")
plt.xlabel("F value")
plt.ylabel("Density")
plt.grid(True)
plt.legend()
plt.show()


 22. Perform a one-way ANOVA test in Python and visualize the results with boxplots to compare group means

In [None]:
import matplotlib.pyplot as plt
from scipy.stats import f_oneway

group1 = [10, 12, 13, 11]
group2 = [14, 15, 13, 16]
group3 = [18, 17, 19, 20]

f_stat, p_val = f_oneway(group1, group2, group3)
print("F-statistic:", f_stat)
print("P-value:", p_val)

plt.boxplot([group1, group2, group3], labels=["G1", "G2", "G3"])
plt.title("One-way ANOVA - Boxplot")
plt.grid(True)
plt.show()


23. Simulate random data from a normal distribution, then perform hypothesis testing to evaluate the means

In [None]:
import numpy as np
from scipy.stats import ttest_1samp

np.random.seed(5)
data = np.random.normal(loc=100, scale=10, size=30)

# Test if mean = 102
t_stat, p_val = ttest_1samp(data, 102)
print("T-statistic:", t_stat)
print("P-value:", p_val)


24. Perform a hypothesis test for population variance using a Chi-square distribution and interpret the results

In [None]:
import numpy as np
from scipy.stats import chi2

data = [20, 22, 21, 19, 23, 20, 22]
sample_var = np.var(data, ddof=1)
n = len(data)
pop_var = 4

chi_sq = (n - 1) * sample_var / pop_var
p_val = 2 * min(chi2.cdf(chi_sq, df=n-1), 1 - chi2.cdf(chi_sq, df=n-1))

print("Chi-square stat:", chi_sq)
print("P-value:", p_val)


25. Write a Python script to perform a Z-test for comparing proportions between two datasets or groups

In [None]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest

success = [45, 30]
nobs = [100, 100]

z_stat, p_val = proportions_ztest(success, nobs)

print("Z-statistic:", z_stat)
print("P-value:", p_val)


 26. mplement an F-test for comparing the variances of two datasets, then interpret and visualize the results

In [None]:
import matplotlib.pyplot as plt

group1 = [24, 25, 22, 23, 24]
group2 = [30, 35, 28, 32, 31]

var1 = np.var(group1, ddof=1)
var2 = np.var(group2, ddof=1)

f_stat = max(var1, var2) / min(var1, var2)
print("F-statistic:", f_stat)

plt.hist(group1, alpha=0.5, label='Group 1')
plt.hist(group2, alpha=0.5, label='Group 2')
plt.title("Variance Comparison - Histogram")
plt.legend()
plt.show()


27. Perform a Chi-square test for goodness of fit with simulated data and analyze the results

In [None]:
from scipy.stats import chisquare

# simulate observed data
observed = [18, 22, 20, 25, 15]
expected = [20, 20, 20, 20, 20]  # expected uniform distribution

chi2_stat, p_val = chisquare(f_obs=observed, f_exp=expected)

print("Chi-square statistic:", chi2_stat)
print("P-value:", p_val)
