# Theory Questions

1. What is hypothesis testing in statistics ?

    - Hypothesis testing is a statistical method used to make decisions or inferences about population parameters using sample data. It involves forming two hypotheses: the null hypothesis (H₀) and the alternative hypothesis (H₁), collecting data, and using a test to determine whether to reject H₀ in favor of H₁.

2. What is the null hypothesis, and how does it differ from the alternative hypothesis ?

    - The null hypothesis (H₀) states there is no effect or difference, while the alternative hypothesis (H₁) claims there is an effect or difference. For example, H₀: "There is no difference in means" vs. H₁: "There is a difference in means".

3. What is the significance level in hypothesis testing, and why is it important ?

    - The significance level (denoted as α) is the probability of rejecting the null hypothesis when it is actually true. Common values are 0.05 or 0.01. It sets the threshold for determining statistical significance and helps control Type 1 error.

4. What does a P-value represent in hypothesis testing ?

    - The P-value is the probability of observing a test statistic as extreme as the one observed, assuming the null hypothesis is true. It helps in deciding whether to reject H₀.

5. How do you interpret the P-value in hypothesis testing ? 

    - If P-value ≤ α: reject H₀ (significant result)

    - If P-value > α: fail to reject H₀ (not significant)
        
        It quantifies the strength of evidence against H₀.

6. What are Type 1 and Type 2 errors in hypothesis testing ? 

    - Type 1 Error: Rejecting H₀ when it’s true (false positive)

    - Type 2 Error: Not rejecting H₀ when it’s false (false negative)

        Reducing one typically increases the other.

7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing ?

    - One-tailed tests look for an effect in one direction only (e.g., > or <).

    - Two-tailed tests consider both directions (≠).

        The choice depends on the research question.

8. What is the Z-test, and when is it used in hypothesis testing ?

    - A Z-test is used to compare sample and population means when the population standard deviation is known and the sample size is large (n > 30). It helps determine if the sample mean is significantly different.

9. How do you calculate the Z-score, and what does it represent in hypothesis testing ?

    - Z = (X - μ) / σ
        
        It measures how many standard deviations a value X is from the mean μ. In testing, it helps determine significance.

10. What is the T-distribution, and when should it be used instead of the normal distribution ?

    - The T-distribution is used instead of the normal distribution when sample sizes are small (n < 30) and the population standard deviation is unknown. It's wider to account for variability.

11. What is the difference between a Z-test and a T-test ?

    - Z-test: known σ, large n

    - T-test: unknown σ, small n
        
        Both test mean differences but under different conditions.

12. What is the T-test, and how is it used in hypothesis testing ?

    - The T-test is used to determine if there’s a significant difference between sample means or between a sample mean and a known value. Types include one-sample, two-sample (independent), and paired T-tests.

13. What is the relationship between Z-test and T-test in hypothesis testing ?

    - Both are parametric tests for mean comparison. The Z-test is a special case of the T-test when the sample size is large and σ is known.

14. What is a confidence interval, and how is it used to interpret statistical results ?

    - A confidence interval gives a range of plausible values for a population parameter. For example, a 95% CI means we are 95% confident the true value lies within that range.

15. What is the margin of error, and how does it affect the confidence interval ?

    - It is the range around a sample estimate that is likely to contain the population parameter. A larger margin = less precision. It is affected by confidence level and sample size.

16. How is Bayes' Theorem used in statistics, and what is its significance ?

    - Bayes' Theorem helps update the probability of a hypothesis based on new evidence. It’s useful in spam detection, medical testing, etc.

17. What is the Chi-square distribution, and when is it used ?

    - It is used to test the relationship between categorical variables and for goodness-of-fit tests. It compares observed and expected frequencies.

18. What is the Chi-square goodness of fit test, and how is it applied ?

    - This test checks how well observed data fit a theoretical distribution. It helps determine whether a sample matches the expected distribution.

19. What is the F-distribution, and when is it used in hypothesis testing ?

    - F-distribution is used to compare variances of two populations, commonly in ANOVA and F-tests.

20. What is an ANOVA test, and what are its assumptions ?

    - ANOVA (Analysis of Variance) tests differences between 3 or more group means. Assumes independence, normality, and equal variances.

21. What are the different types of ANOVA tests ?

    - One-way ANOVA: one factor

    - Two-way ANOVA: two factors

    - Repeated Measures ANOVA: same subjects tested under different conditions

22. What is the F-test, and how does it relate to hypothesis testing ?

    - An F-test compares two variances or multiple group means (via ANOVA). It helps determine whether group variances are equal.

# Practical Questions (Part 1)

In [None]:
# 1. Write a Python program to generate a random variable and display its value.

import numpy as np

# Generate a single random variable between 0 and 1
random_var = np.random.rand()
print("Random Variable:", random_var)


In [None]:
# 2. Generate a discrete uniform distribution using Python and plot the probability mass function (PMF).

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import bernoulli, binom, poisson, uniform, norm, zscore, ttest_ind, ttest_rel, chi2_contingency

# Discrete uniform values (e.g., dice outcomes)
values = [1, 2, 3, 4, 5, 6]
probability = [1/6] * 6  # Equal probability for each value

plt.stem(values, probability, use_line_collection=True)
plt.title("Discrete Uniform Distribution - PMF")
plt.xlabel("Value")
plt.ylabel("Probability")
plt.grid(True)
plt.show()


In [None]:
# 3. Write a Python function to calculate the probability distribution function (PDF) of a Bernoulli distribution.

from scipy.stats import bernoulli

def bernoulli_pdf(p, x):
    return bernoulli.pmf(x, p)

# Example
print("PDF:", bernoulli_pdf(0.6, 1))


In [None]:
# 4. Write a Python script to simulate a binomial distribution with n=10 and p=0.5, then plot its histogram.

from scipy.stats import binom

data = binom.rvs(n=10, p=0.5, size=1000)

plt.hist(data, bins=10, edgecolor='black')
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Outcomes")
plt.ylabel("Frequency")
plt.show()


In [None]:
# 5. Create a Poisson distribution and visualize it using Python.

from scipy.stats import poisson

poisson_data = poisson.rvs(mu=3, size=1000)

plt.hist(poisson_data, bins=15, edgecolor='black')
plt.title("Poisson Distribution (λ=3)")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()


In [None]:
# 6.  Write a Python program to calculate and plot the cumulative distribution function (CDF) of a discrete uniform distribution.

import numpy as np

values = np.array([1, 2, 3, 4, 5, 6])
probability = np.array([1/6]*6)
cdf = np.cumsum(probability)

plt.step(values, cdf, where='post')
plt.title("CDF of Discrete Uniform Distribution")
plt.xlabel("Value")
plt.ylabel("Cumulative Probability")
plt.grid(True)
plt.show()


In [None]:
# 7. Generate a continuous uniform distribution using NumPy and visualize it.

data = np.random.uniform(low=0, high=1, size=1000)

plt.hist(data, bins=20, edgecolor='black')
plt.title("Continuous Uniform Distribution")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()


In [None]:
# 8. Simulate data from a normal distribution and plot its histogram.

normal_data = np.random.normal(loc=0, scale=1, size=1000)

plt.hist(normal_data, bins=30, edgecolor='black')
plt.title("Normal Distribution (mean=0, std=1)")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()


In [None]:
# 9. Write a Python function to calculate Z-scores from a dataset and plot them.

from scipy.stats import zscore

def plot_z_scores(data):
    z_scores = zscore(data)
    plt.plot(z_scores)
    plt.title("Z-scores of Dataset")
    plt.xlabel("Index")
    plt.ylabel("Z-score")
    plt.grid(True)
    plt.show()

# Example
plot_z_scores(normal_data)


In [None]:
# 10. Implement the Central Limit Theorem (CLT) using Python for a non-normal distribution.

# Non-normal (exponential) population
population = np.random.exponential(scale=2, size=10000)

# Take 1000 sample means of size 30
sample_means = [np.mean(np.random.choice(population, size=30)) for _ in range(1000)]

plt.hist(sample_means, bins=30, edgecolor='black')
plt.title("Central Limit Theorem (Exponential Population)")
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.show()


In [None]:
# 11. Simulate multiple samples from a normal distribution and verify the Central Limit Theorem.

sample_means = [np.mean(np.random.normal(loc=50, scale=10, size=30)) for _ in range(1000)]

plt.hist(sample_means, bins=30, edgecolor='black')
plt.title("CLT Verification using Normal Distribution")
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.show()


In [None]:
# 12. Write a Python function to calculate and plot the standard normal distribution (mean = 0, std = 1).

x = np.linspace(-4, 4, 1000)
y = norm.pdf(x, loc=0, scale=1)

plt.plot(x, y)
plt.title("Standard Normal Distribution (PDF)")
plt.xlabel("Z-score")
plt.ylabel("Probability Density")
plt.grid(True)
plt.show()


In [None]:
# 13. Generate random variables and calculate their corresponding probabilities using the binomial distribution.

k_values = np.arange(11)
binom_probs = binom.pmf(k=k_values, n=10, p=0.5)
print("Binomial probabilities:", binom_probs)


In [None]:
# 14. Write a Python program to calculate the Z-score for a given data point and compare it to a standard normal distribution.

data_point = 72
mean = 70
std_dev = 5

z_score = (data_point - mean) / std_dev
print("Z-score:", z_score)


In [None]:
# 15. Implement hypothesis testing using Z-statistics for a sample dataset.

sample_data = np.random.normal(loc=70, scale=5, size=30)
sample_mean = np.mean(sample_data)
z_stat = (sample_mean - 70) / (5 / np.sqrt(30))
print("Z-statistic:", z_stat)


In [None]:
# 16. Create a confidence interval for a dataset using Python and interpret the result.

sem = 5 / np.sqrt(30)
ci_lower = sample_mean - 1.96 * sem
ci_upper = sample_mean + 1.96 * sem
print("95% Confidence Interval:", (ci_lower, ci_upper))


In [None]:
# 17. Generate data from a normal distribution, then calculate and interpret the confidence interval for its mean.

data = np.random.normal(loc=60, scale=10, size=50)
mean = np.mean(data)
sem = np.std(data) / np.sqrt(len(data))
ci = (mean - 1.96*sem, mean + 1.96*sem)
print("CI for normal data:", ci)


In [None]:
# 18. Write a Python script to calculate and visualize the probability density function (PDF) of a normal distribution.

x = np.linspace(-4, 4, 1000)
y = norm.pdf(x)

plt.plot(x, y)
plt.title("Normal Distribution PDF")
plt.xlabel("Value")
plt.ylabel("Probability Density")
plt.grid(True)
plt.show()


In [None]:
# 19. Use Python to calculate and interpret the cumulative distribution function (CDF) of a Poisson distribution.

x = np.arange(0, 10)
cdf = poisson.cdf(x, mu=3)

plt.step(x, cdf, where='mid')
plt.title("Poisson CDF (λ=3)")
plt.xlabel("x")
plt.ylabel("Cumulative Probability")
plt.grid(True)
plt.show()


In [None]:
# 20. Simulate a random variable using a continuous uniform distribution and calculate its expected value.

data = np.random.uniform(low=0, high=10, size=1000)
expected_value = np.mean(data)
print("Expected Value:", expected_value)


In [None]:
# 21. Write a Python program to compare the standard deviations of two datasets and visualize the difference.

a = np.random.normal(0, 1, 100)
b = np.random.normal(0, 2, 100)

plt.boxplot([a, b], labels=["Dataset A", "Dataset B"])
plt.title("Standard Deviation Comparison")
plt.show()


In [None]:
# 22. Calculate the range and interquartile range (IQR) of a dataset generated from a normal distribution.

data = np.random.normal(0, 1, 100)
range_val = np.ptp(data)
iqr_val = np.percentile(data, 75) - np.percentile(data, 25)

print("Range:", range_val)
print("IQR:", iqr_val)


In [None]:
# 23. Implement Z-score normalization on a dataset and visualize its transformation.

normalized_data = zscore(data)

plt.hist(normalized_data, bins=30, edgecolor='black')
plt.title("Z-score Normalized Data")
plt.xlabel("Z-score")
plt.ylabel("Frequency")
plt.show()


In [None]:
# 24. Write a Python function to calculate the skewness and kurtosis of a dataset generated from a normal distribution.

from scipy.stats import skew, kurtosis

print("Skewness:", skew(data))
print("Kurtosis:", kurtosis(data))
    

# Practical Questions (Part 2)

In [None]:
# 1. Write a Python program to perform a Z-test for comparing a sample mean to a known population mean and interpret the results.

from scipy.stats import norm

sample = np.random.normal(loc=52, scale=5, size=30)
sample_mean = np.mean(sample)
pop_mean = 50
std_dev = 5
z = (sample_mean - pop_mean) / (std_dev / np.sqrt(len(sample)))
p_value = 2 * (1 - norm.cdf(abs(z)))

print("Z-statistic:", z)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")
    

In [None]:
# 2. Simulate random data to perform hypothesis testing and calculate the corresponding P-value using Python. 

data = np.random.normal(loc=100, scale=15, size=40)
z_stat = (np.mean(data) - 100) / (15 / np.sqrt(40))
p_val = 2 * (1 - norm.cdf(abs(z_stat)))

print("Z-statistic:", z_stat)
print("P-value:", p_val)


In [None]:
# 3. Implement a one-sample Z-test using Python to compare the sample mean with the population mean.

def one_sample_z_test(data, pop_mean, std_dev):
    z = (np.mean(data) - pop_mean) / (std_dev / np.sqrt(len(data)))
    p = 2 * (1 - norm.cdf(abs(z)))
    return z, p

data = np.random.normal(loc=48, scale=4, size=35)
z, p = one_sample_z_test(data, pop_mean=50, std_dev=4)
print("Z:", z, "P-value:", p)


In [None]:
# 4. Perform a two-tailed Z-test using Python and visualize the decision region on a plot.

x = np.linspace(-4, 4, 1000)
y = norm.pdf(x)
z_stat = 2.1  # Example

plt.plot(x, y, label='Normal Curve')
plt.fill_between(x, y, where=(abs(x) > 1.96), color='lightgrey', label='Rejection Region')
plt.axvline(z_stat, color='red', linestyle='--', label='Z-statistic')
plt.title("Two-tailed Z-test")
plt.legend()
plt.grid(True)
plt.show()


In [None]:
# 5. Create a Python function that calculates and visualizes Type 1 and Type 2 errors during hypothesis testing.

def plot_errors():
    x = np.linspace(-4, 4, 1000)
    h0 = norm.pdf(x, 0, 1)
    h1 = norm.pdf(x, 1.5, 1)

    plt.plot(x, h0, label='H0: Null Hypothesis')
    plt.plot(x, h1, label='H1: Alternative Hypothesis')
    plt.fill_between(x, h0, where=(x > 1.96), color='lightcoral', alpha=0.5, label='Type I Error (α)')
    plt.fill_between(x, h1, where=(x < 1.96), color='lightblue', alpha=0.5, label='Type II Error (β)')
    plt.legend()
    plt.title("Type I and Type II Errors")
    plt.grid(True)
    plt.show()

plot_errors()


In [None]:
# 6. Write a Python program to perform an independent T-test and interpret the results.

group1 = np.random.normal(60, 10, 30)
group2 = np.random.normal(65, 12, 30)
t_stat, p_val = ttest_ind(group1, group2)

print("T-statistic:", t_stat)
print("P-value:", p_val)
if p_val < 0.05:
    print("Significant difference.")
else:
    print("No significant difference.")


In [None]:
# 7. Perform a paired sample T-test using Python and visualize the comparison results.

before = np.random.normal(70, 5, 30)
after = before + np.random.normal(2, 2, 30)
t_stat, p_val = ttest_rel(before, after)

print("Paired T-test result:", t_stat, p_val)
plt.plot(before, label="Before")
plt.plot(after, label="After")
plt.legend()
plt.title("Paired Sample Comparison")
plt.show()


In [None]:
# 8. Simulate data and perform both Z-test and T-test, then compare the results using Python.

# Z-test
sample_z = np.random.normal(50, 5, 40)
z = (np.mean(sample_z) - 50) / (5 / np.sqrt(40))
p_z = 2 * (1 - norm.cdf(abs(z)))

# T-test
sample_t1 = np.random.normal(50, 5, 20)
sample_t2 = np.random.normal(52, 5, 20)
t_stat, p_t = ttest_ind(sample_t1, sample_t2)

print("Z-test p-value:", p_z)
print("T-test p-value:", p_t)


In [None]:
# 9. Write a Python function to calculate the confidence interval for a sample mean and explain its significance.

def confidence_interval(data, confidence=0.95):
    mean = np.mean(data)
    sem = np.std(data, ddof=1) / np.sqrt(len(data))
    margin = 1.96 * sem
    return (mean - margin, mean + margin)

data = np.random.normal(60, 8, 30)
print("95% CI:", confidence_interval(data))


In [None]:
# 10. Write a Python program to calculate the margin of error for a given confidence level using sample data.

sample = np.random.normal(70, 10, 50)
sem = np.std(sample, ddof=1) / np.sqrt(len(sample))
moe = 1.96 * sem
print("Margin of Error:", moe)


In [None]:
# 11. Implement a Bayesian inference method using Bayes' Theorem in Python and explain the process.

# P(A|B) = [P(B|A) * P(A)] / P(B)
p_a = 0.01  # prior
p_b_given_a = 0.9
p_b = 0.05

p_a_given_b = (p_b_given_a * p_a) / p_b
print("Posterior Probability (P(A|B)):", p_a_given_b)


In [None]:
# 12. Perform a Chi-square test for independence between two categorical variables in Python.

data = [[30, 10], [20, 40]]

chi2, p, dof, expected = chi2_contingency(data)

print("Chi-square Statistic:", chi2)
print("P-value:", p)
print("Expected Frequencies:", expected)


In [None]:
# 13. Write a Python program to calculate the expected frequencies for a Chi-square test based on observed data.


data = [[30, 10], [20, 40]]

chi2, p, dof, expected = chi2_contingency(data)

print("Chi-square Statistic:", chi2)
print("P-value:", p)
print("Expected Frequencies:", expected)


In [None]:
# 14. Perform a goodness-of-fit test using Python to compare the observed data to an expected distribution.

from scipy.stats import chisquare

observed = [18, 22, 30]
expected = [20, 20, 30]

chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)

print("Chi-square Statistic:", chi2_stat)
print("P-value:", p_value)
