


---

1. What is hypothesis testing in statistics?

Hypothesis testing is a statistical method used to make decisions or inferences about a population based on sample data. It involves formulating a null and alternative hypothesis and using data to determine which is more likely.


---

2. What is the null hypothesis, and how does it differ from the alternative hypothesis?

Null hypothesis (H₀): A default claim that there is no effect or difference (e.g., "The mean is equal to X").

Alternative hypothesis (H₁ or Ha): A statement that contradicts the null (e.g., "The mean is not equal to X").
They are mutually exclusive.



---

3. What is the significance level in hypothesis testing, and why is it important?

The significance level (α) is the threshold for rejecting the null hypothesis, commonly set at 0.05. It represents the probability of making a Type I error—rejecting a true null hypothesis.


---

4. What does a P-value represent in hypothesis testing?

The P-value is the probability of obtaining results at least as extreme as those observed, assuming the null hypothesis is true.


---

5. How do you interpret the P-value in hypothesis testing?

If P-value ≤ α, reject the null hypothesis (evidence supports the alternative).

If P-value > α, fail to reject the null hypothesis (insufficient evidence).



---

6. What are Type 1 and Type 2 errors in hypothesis testing?

Type I error (α): Rejecting a true null hypothesis.

Type II error (β): Failing to reject a false null hypothesis.



---

7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing?

One-tailed test: Tests for effect in one direction (e.g., greater than).

Two-tailed test: Tests for effect in both directions (e.g., not equal to).



---

8. What is the Z-test, and when is it used in hypothesis testing?

A Z-test is used to test hypotheses about population means when the population variance is known and the sample size is large (n > 30).


---

9. How do you calculate the Z-score, and what does it represent in hypothesis testing?

Z = (X̄ - μ) / (σ / √n)
It shows how many standard deviations a sample mean is from the population mean under the null hypothesis.


---

10. What is the T-distribution, and when should it be used instead of the normal distribution?

The T-distribution is used when the sample size is small (n < 30) and the population standard deviation is unknown. It is similar to the normal distribution but has heavier tails.


---

11. What is the difference between a Z-test and a T-test?

Z-test: Known population variance and large sample size.

T-test: Unknown population variance and small sample size.



---

12. What is the T-test, and how is it used in hypothesis testing?

A T-test is used to compare sample means or a sample mean against a known value, especially with small sample sizes or unknown variance.


---

13. What is the relationship between Z-test and T-test in hypothesis testing?

Both test means, but a T-test is more general and applicable when population variance is unknown. As sample size increases, the T-distribution approaches the normal distribution, making T-tests converge to Z-tests.


---

14. What is a confidence interval, and how is it used to interpret statistical results?

A confidence interval gives a range of values within which the population parameter likely falls, with a given confidence level (e.g., 95%).


---

15. What is the margin of error, and how does it affect the confidence interval?

The margin of error is the range added/subtracted from the point estimate to get the confidence interval. Larger margins mean wider intervals, indicating more uncertainty.


---

16. How is Bayes' Theorem used in statistics, and what is its significance?

Bayes' Theorem is used to update probabilities based on new evidence. It’s foundational in Bayesian statistics, enabling dynamic inference.


---

17. What is the Chi-square distribution, and when is it used?

The Chi-square distribution is used for categorical data analysis, such as tests of independence and goodness-of-fit. It’s skewed right and defined only for non-negative values.


---

18. What is the Chi-square goodness of fit test, and how is it applied?

It tests whether observed categorical data fit a specified distribution. Compares observed and expected frequencies using the Chi-square statistic.


---

19. What is the F-distribution, and when is it used in hypothesis testing?

The F-distribution is used to compare variances or in ANOVA. It arises when testing ratios of variances and is always positive and right-skewed.


---

20. What is an ANOVA test, and what are its assumptions?

ANOVA (Analysis of Variance) tests whether there are statistically significant differences between three or more group means.
Assumptions:

Independence

Normality

Equal variances (homogeneity)



---

21. What are the different types of ANOVA tests?

One-way ANOVA: One independent variable.

Two-way ANOVA: Two independent variables.

Repeated measures ANOVA: Repeated observations on the same subjects.



---

22. What is the F-test, and how does it relate to hypothesis testing?

An F-test is used to compare two variances or test the overall significance in ANOVA. It determines whether group means or variances differ significantly.


PRACTICAL

1. Generate a Random Variable and Display its Value

import numpy as np

x = np.random.rand()
print("Random Variable:", x)


---

2. Discrete Uniform Distribution and Plot PMF

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import randint

low, high = 1, 7  # Simulating dice roll (1 to 6)
x = np.arange(low, high)
pmf = randint.pmf(x, low, high)

plt.stem(x, pmf, use_line_collection=True)
plt.title("PMF of Discrete Uniform Distribution")
plt.xlabel("Value")
plt.ylabel("Probability")
plt.show()


---

3. Bernoulli Distribution PDF

from scipy.stats import bernoulli

def bernoulli_pdf(p):
    x = [0, 1]
    probs = bernoulli.pmf(x, p)
    return dict(zip(x, probs))

print(bernoulli_pdf(0.6))


---

4. Simulate Binomial Distribution and Plot Histogram

import numpy as np
import matplotlib.pyplot as plt

n, p = 10, 0.5
samples = np.random.binomial(n, p, 1000)

plt.hist(samples, bins=range(n+2), density=True, alpha=0.7, edgecolor='black')
plt.title("Binomial Distribution Histogram")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.show()


---

5. Create Poisson Distribution and Visualize

from scipy.stats import poisson
import matplotlib.pyplot as plt

mu = 3
x = np.arange(0, 10)
pmf = poisson.pmf(x, mu)

plt.stem(x, pmf, use_line_collection=True)
plt.title("Poisson Distribution (μ=3)")
plt.xlabel("Number of Events")
plt.ylabel("Probability")
plt.show()


---

6. CDF of Discrete Uniform Distribution

from scipy.stats import randint
import matplotlib.pyplot as plt
import numpy as np

low, high = 1, 7
x = np.arange(low, high)
cdf = randint.cdf(x, low, high)

plt.step(x, cdf, where='post')
plt.title("CDF of Discrete Uniform Distribution")
plt.xlabel("Value")
plt.ylabel("Cumulative Probability")
plt.grid(True)
plt.show()


---

7. Continuous Uniform Distribution and Visualization

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import uniform

samples = uniform.rvs(loc=0, scale=1, size=1000)
plt.hist(samples, bins=30, density=True, alpha=0.7, edgecolor='black')
plt.title("Continuous Uniform Distribution")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()


---

8. Simulate Normal Distribution and Plot Histogram

import numpy as np
import matplotlib.pyplot as plt

samples = np.random.normal(loc=0, scale=1, size=1000)
plt.hist(samples, bins=30, density=True, alpha=0.7, edgecolor='black')
plt.title("Normal Distribution")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()


---

9. Calculate Z-scores and Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import zscore

data = np.random.normal(10, 2, 100)
z_scores = zscore(data)

plt.hist(z_scores, bins=30, edgecolor='black')
plt.title("Z-scores of Dataset")
plt.xlabel("Z-score")
plt.ylabel("Frequency")
plt.show()


---

10. Central Limit Theorem Simulation

import numpy as np
import matplotlib.pyplot as plt

# Exponential distribution (non-normal)
population = np.random.exponential(scale=2, size=100000)

sample_means = []
for _ in range(1000):
    sample = np.random.choice(population, size=30)
    sample_means.append(np.mean(sample))

plt.hist(sample_means, bins=30, density=True, edgecolor='black', alpha=0.7)
plt.title("CLT: Sample Means from Exponential Distribution")
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.show



11. Simulate Normal Samples and Verify CLT

import numpy as np
import matplotlib.pyplot as plt

means = []
for _ in range(1000):
    sample = np.random.normal(loc=50, scale=10, size=30)
    means.append(np.mean(sample))

plt.hist(means, bins=30, density=True, edgecolor='black')
plt.title("CLT: Distribution of Sample Means")
plt.xlabel("Sample Mean")
plt.ylabel("Density")
plt.show()


---

12. Standard Normal Distribution Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def plot_standard_normal():
    x = np.linspace(-4, 4, 1000)
    y = norm.pdf(x)
    plt.plot(x, y)
    plt.title("Standard Normal Distribution")
    plt.xlabel("Z")
    plt.ylabel("PDF")
    plt.grid(True)
    plt.show()

plot_standard_normal()


---

13. Binomial Probabilities for Random Variables

from scipy.stats import binom

n, p = 10, 0.5
x = np.arange(0, n+1)
probabilities = binom.pmf(x, n, p)
print(dict(zip(x, probabilities)))


---

14. Z-score for a Data Point

def calculate_z_score(x, mu, sigma):
    z = (x - mu) / sigma
    return z

z = calculate_z_score(75, 70, 5)
print("Z-score:", z)


---

15. Hypothesis Testing using Z-statistic

import numpy as np
from scipy.stats import norm

# Sample
data = np.random.normal(loc=100, scale=15, size=30)
sample_mean = np.mean(data)
pop_mean = 95
pop_std = 15
n = len(data)

z = (sample_mean - pop_mean) / (pop_std / np.sqrt(n))
p_value = 1 - norm.cdf(z)

print(f"Z: {z:.2f}, P-value: {p_value:.4f}")


---

16. Confidence Interval from Data

import numpy as np
import scipy.stats as stats

data = np.random.normal(loc=50, scale=5, size=100)
mean = np.mean(data)
sem = stats.sem(data)
ci = stats.t.interval(0.95, len(data)-1, loc=mean, scale=sem)

print("95% Confidence Interval:", ci)


---

17. CI from Normal Distribution and Interpretation

data = np.random.normal(loc=60, scale=10, size=50)
mean = np.mean(data)
sem = stats.sem(data)
ci = stats.t.interval(0.95, df=len(data)-1, loc=mean, scale=sem)

print(f"Sample Mean = {mean:.2f}")
print(f"95% CI for Mean = {ci}")


---

18. Normal PDF Visualization

x = np.linspace(-5, 5, 1000)
pdf = norm.pdf(x, loc=0, scale=1)

plt.plot(x, pdf)
plt.title("PDF of Normal Distribution")
plt.xlabel("x")
plt.ylabel("Density")
plt.grid(True)
plt.show()


---

19. Poisson CDF Calculation

from scipy.stats import poisson

mu = 3
x = np.arange(0, 10)
cdf = poisson.cdf(x, mu)
print(dict(zip(x, cdf)))


---

20. Continuous Uniform Random Variable and Expected Value

from scipy.stats import uniform

a, b = 5, 15
sample = uniform.rvs(loc=a, scale=b-a, size=1000)
expected_value = np.mean(sample)
print(f"Expected Value (approx): {expected_value:.2f}")


---

21. Compare Standard Deviations and Visualize

data1 = np.random.normal(0, 1, 100)
data2 = np.random.normal(0, 2, 100)

print("STD Data1:", np.std(data1))
print("STD Data2:", np.std(data2))

plt.hist(data1, alpha=0.5, label='Std=1')
plt.hist(data2, alpha=0.5, label='Std=2')
plt.legend()
plt.title("Comparing Standard Deviations")
plt.show()


---

22. Calculate Range and IQR

data = np.random.normal(100, 15, 100)
data_range = np.max(data) - np.min(data)
iqr = np.percentile(data, 75) - np.percentile(data, 25)

print("Range:", data_range)
print("IQR:", iqr)


---

23. Z-score Normalization and Visualization

data = np.random.normal(50, 10, 100)
normalized = zscore(data)

plt.hist(normalized, bins=30, edgecolor='black')
plt.title("Z-score Normalized Data")
plt.xlabel("Z-score")
plt.ylabel("Frequency")
plt.show()


---

24. Skewness and Kurtosis of Normal Data

from scipy.stats import skew, kurtosis

data = np.random.normal(0, 1, 1000)
print("Skewness:", skew(data))
print("Kurtosis:", kurtosis(data))



PART 2



---

1. Z-test for Sample Mean vs Population Mean

from scipy.stats import norm
import numpy as np

def z_test(sample, pop_mean, pop_std):
    n = len(sample)
    sample_mean = np.mean(sample)
    z = (sample_mean - pop_mean) / (pop_std / np.sqrt(n))
    p = 2 * (1 - norm.cdf(abs(z)))
    print(f"Z-score = {z:.3f}, P-value = {p:.3f}")
    return z, p

data = np.random.normal(100, 15, 30)
z_test(data, pop_mean=95, pop_std=15)


---

2. Simulate Data and Perform Hypothesis Test

sample = np.random.normal(50, 10, 40)
z_stat, p_val = z_test(sample, pop_mean=52, pop_std=10)
print(f"Simulated P-value: {p_val:.4f}")


---

3. One-sample Z-test

(Same as #1 — wrapped in a reusable function)


---

4. Two-tailed Z-test with Decision Region Plot

import matplotlib.pyplot as plt

z_score = 2.0
x = np.linspace(-4, 4, 1000)
y = norm.pdf(x)

plt.plot(x, y)
plt.fill_between(x, y, where=(x <= -1.96) | (x >= 1.96), color='red', alpha=0.3, label='Rejection Region')
plt.axvline(z_score, color='blue', linestyle='--', label='Z-score')
plt.title("Two-tailed Z-test")
plt.legend()
plt.show()


---

5. Visualize Type I and II Errors

def plot_type_errors(mu0, mu1, sigma, n, alpha=0.05):
    se = sigma / np.sqrt(n)
    x = np.linspace(mu0 - 4*se, mu1 + 4*se, 1000)
    y0 = norm.pdf(x, mu0, se)
    y1 = norm.pdf(x, mu1, se)
    
    z_critical = norm.ppf(1 - alpha/2)
    upper_crit = mu0 + z_critical * se
    lower_crit = mu0 - z_critical * se
    
    plt.plot(x, y0, label='H0', color='blue')
    plt.plot(x, y1, label='H1', color='orange')
    plt.axvline(upper_crit, color='red', linestyle='--')
    plt.axvline(lower_crit, color='red', linestyle='--')
    
    plt.fill_between(x, y1, where=(x < lower_crit) | (x > upper_crit), color='green', alpha=0.3, label='Type II Error')
    plt.fill_between(x, y0, where=(x < lower_crit) | (x > upper_crit), color='red', alpha=0.2, label='Type I Error')
    
    plt.legend()
    plt.title("Type I and II Errors")
    plt.show()

plot_type_errors(mu0=100, mu1=105, sigma=15, n=30)


---

6. Independent T-test

from scipy.stats import ttest_ind

group1 = np.random.normal(50, 5, 30)
group2 = np.random.normal(53, 5, 30)
t_stat, p_val = ttest_ind(group1, group2)
print(f"T = {t_stat:.3f}, P = {p_val:.3f}")


---

7. Paired Sample T-test and Visualization

from scipy.stats import ttest_rel

before = np.random.normal(70, 5, 30)
after = before + np.random.normal(2, 3, 30)
t_stat, p_val = ttest_rel(before, after)

plt.plot(before, label="Before")
plt.plot(after, label="After")
plt.title("Paired T-test Data")
plt.legend()
plt.show()
print(f"T = {t_stat:.3f}, P = {p_val:.3f}")


---

8. Compare Z-test and T-test

from scipy.stats import ttest_1samp

data = np.random.normal(100, 15, 25)
z_stat, z_p = z_test(data, pop_mean=95, pop_std=15)
t_stat, t_p = ttest_1samp(data, popmean=95)

print(f"Z-test P = {z_p:.3f}, T-test P = {t_p:.3f}")


---

9. Confidence Interval Function

def confidence_interval(data, confidence=0.95):
    mean = np.mean(data)
    sem = stats.sem(data)
    ci = stats.t.interval(confidence, len(data)-1, loc=mean, scale=sem)
    print(f"{confidence*100:.1f}% CI: {ci}")
    return ci

sample = np.random.normal(55, 10, 40)
confidence_interval(sample)


---

10. Margin of Error Calculation

def margin_of_error(data, confidence=0.95):
    sem = stats.sem(data)
    moe = sem * stats.t.ppf((1 + confidence) / 2, df=len(data)-1)
    print(f"Margin of Error: {moe:.2f}")
    return moe

margin_of_error(sample)


---

11. Bayesian Inference Example

def bayes_theorem(prior_A, prob_B_given_A, prob_B_given_notA):
    prob_notA = 1 - prior_A
    prob_B = prob_B_given_A * prior_A + prob_B_given_notA * prob_notA
    posterior_A = (prob_B_given_A * prior_A) / prob_B
    print(f"Posterior P(A|B): {posterior_A:.4f}")
    return posterior_A

bayes_theorem(0.01, 0.9, 0.05)  # e.g., disease testing


---

12. Chi-square Test for Independence

import pandas as pd
from scipy.stats import chi2_contingency

data = [[30, 10], [20, 40]]
chi2, p, dof, expected = chi2_contingency(data)
print(f"Chi2 = {chi2:.2f}, P = {p:.3f}")


---

13. Expected Frequencies for Chi-square

observed = np.array([[30, 10], [20, 40]])
_, _, _, expected = chi2_contingency(observed)
print("Expected Frequencies:\n", expected)


---

14. Goodness-of-Fit Test

from scipy.stats import chisquare

observed = [20, 30, 50]
expected = [25, 25, 50]
chi2, p = chisquare(f_obs=observed, f_exp=expected)
print(f"Chi2 = {chi2:.2f}, P = {p:.3f}")

