Question 1 : What are Type I and Type II errors in hypothesis testing, and how do they
impact decision-making?                        
- In hypothesis testing, we make decisions based on sample data, which can sometimes lead to incorrect conclusions. These incorrect decisions are classified as Type I and Type II errors.

**Type I Error (α error)**

A Type I error occurs when the null hypothesis (H₀) is true, but we incorrectly reject it.

It is also called a false positive.

Example: Concluding that a new treatment is effective when, in reality, it is not.

The probability of making a Type I error is denoted by α (significance level).

**Type II Error (β error)**

A Type II error occurs when the null hypothesis (H₀) is false, but we fail to reject it.

It is also called a false negative.

Example: Concluding that a new treatment is not effective when it actually is.

The probability of making a Type II error is denoted by β, and (1 − β) represents the power of the test.

**Impact on Decision-Making**               
Type I errors can lead to accepting false findings, which may result in unnecessary actions or incorrect conclusions being adopted (e.g., approving an ineffective drug).

Type II errors can cause real effects to go unnoticed, which may delay beneficial decisions (e.g., rejecting a truly effective drug).

Question 2:What is the P-value in hypothesis testing, and how should it be interpreted
in the context of the null hypothesis?      
- **P-Value in Hypothesis Testing**

The P-value is the probability of obtaining a test statistic (or result) that is as extreme as, or more extreme than, the one observed, assuming the null hypothesis (H₀) is true.

In simple terms, it measures how compatible the sample data is with the null hypothesis.             

**Interpretation of the P-Value**

A small P-value (typically ≤ 0.05) suggests that the observed data is unlikely under the null hypothesis.

Therefore, we have evidence to reject H₀.

A large P-value (> 0.05) indicates that the observed data is consistent with the null hypothesis.

Therefore, we fail to reject H₀ (i.e., we do not have enough evidence against it).

**Key Points for Interpretation**

P-value does NOT prove the null hypothesis is true or false.
It only measures the strength of evidence against H₀.

Lower P-value = stronger evidence against H₀.

Higher P-value = weaker evidence against H₀.

Question 3:Explain the difference between a Z-test and a T-test, including when to use
each.

**Difference Between a Z-Test and a T-Test**

A Z-test and a T-test are both statistical tests used to compare sample data with population parameters, but they differ in their assumptions and when they should be applied.

**1. Z-Test**

Used when:

The population standard deviation (σ) is known.

The sample size is large (typically n ≥ 30).

Data is normally distributed or approximately normal (due to large sample size).

Purpose:
To test hypotheses about population means or proportions when variability is known and the sample is large.

Example:
Testing whether the mean height of students differs from a known population mean when σ is known.

**2. T-Test**

Used when:

The population standard deviation is unknown.

The sample size is small (typically n < 30).

Data comes from a normally distributed population.

Purpose:
To test hypotheses about population means when the sample size is small and σ must be estimated using the sample standard deviation.

Example:
Comparing the average test scores of a small class to a hypothesized population mean.

Question 4:What is a confidence interval, and how does the margin of error influence
its width and interpretation?
- **Confidence Interval and Margin of Error**

A confidence interval (CI) is a range of values, calculated from sample data, that is likely to contain the true population parameter (such as the mean or proportion) with a certain level of confidence (e.g., 95%).
It provides both an estimate and an indication of how precise that estimate is.

Example: A 95% CI of (48, 52) for a mean suggests we are 95% confident that the true population mean lies between 48 and 52.

**Margin of Error (ME) and Its Influence**

The margin of error is the amount added and subtracted from the sample estimate to form the confidence interval:

CI
=
Estimate
±
Margin of Error
CI=Estimate±Margin of Error
The margin of error depends on:

The confidence level (higher confidence → larger margin of error)

The variability in data

The sample size (larger samples → smaller margin of error)

**How Margin of Error Affects the Width of the Confidence Interval**

**Larger margin of error → wider confidence interval**

Interpretation: The estimate is less precise.

Caused by high variability, smaller sample sizes, or higher confidence levels.

**Smaller margin of error → narrower confidence interval**

Interpretation: The estimate is more precise.

Caused by lower variability, larger sample sizes, or lower confidence levels.

Question 5: Describe the purpose and assumptions of an ANOVA test. How does it
extend hypothesis testing to more than two groups?
- **Purpose of ANOVA**

ANOVA (Analysis of Variance) is a statistical test used to determine whether there are significant differences among the means of three or more independent groups.
Instead of performing multiple t-tests (which increases the chance of Type I errors), ANOVA tests all group means simultaneously.

**Assumptions of ANOVA**

To apply ANOVA correctly, the following assumptions must be met:

**Independence of observations**

Each sample must be collected independently from the others.

**Normality**

The data in each group should be approximately normally distributed.
**Homogeneity of variances (equal variances)**

All groups should have similar variances (also called homoscedasticity).

**How ANOVA Extends Hypothesis Testing to More Than Two Groups**

A t-test compares the means of two groups only.

ANOVA generalizes this idea and allows comparison across three or more groups in a single test.

How it works:

Null hypothesis (H₀): All group means are equal.

Alternative hypothesis (H₁): At least one group mean is different.

ANOVA analyzes the variation between groups and within groups.

Question 6: Write a Python program to perform a one-sample Z-test and interpret the
result for a given dataset.
-


In [None]:
import numpy as np
from scipy.stats import norm


data = np.array([52, 49, 50, 51, 53, 54, 48, 50, 49, 52])
population_mean = 50
population_std = 3  # σ must be known for a Z-test
sample_mean = np.mean(data)
n = len(data)


z_stat = (sample_mean - population_mean) / (population_std / np.sqrt(n))


p_value = 2 * (1 - norm.cdf(abs(z_stat)))
print("Sample Mean:", sample_mean)
print("Z-Statistic:", z_stat)
print("P-Value:", p_value)


alpha = 0.05
if p_value < alpha:
    print("Conclusion: Reject the null hypothesis. The sample mean is significantly different.")
else:
    print("Conclusion: Fail to reject the null hypothesis. No significant difference found.")


Question 7:Simulate a dataset from a binomial distribution (n = 10, p = 0.5) using
NumPy and plot the histogram.


In [None]:
import numpy as np
import matplotlib.pyplot as plt


n_trials = 10
p_success = 0.5
sample_size = 1000

data = np.random.binomial(n=n_trials, p=p_success, size=sample_size)
plt.hist(data, bins=np.arange(0, 12) - 0.5, edgecolor='black', alpha=0.7)
plt.title("Histogram of Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.xticks(range(0, 11))
plt.show()


Question 8: Generate multiple samples from a non-normal distribution and implement
the Central Limit Theorem using Python.

In [None]:
import numpy as np
import matplotlib.pyplot as plt


sample_size = 50
num_samples = 1000
data = np.random.exponential(scale=1, size=(num_samples, sample_size))

sample_means = np.mean(data, axis=1)

plt.figure(figsize=(12, 5))


plt.subplot(1, 2, 1)
plt.hist(data.flatten(), bins=40, color='skyblue', edgecolor='black')
plt.title("Original Distribution (Exponential)")
plt.xlabel("Value")
plt.ylabel("Frequency")


plt.subplot(1, 2, 2)
plt.hist(sample_means, bins=30, color='salmon', edgecolor='black')
plt.title("Distribution of Sample Means (CLT)")
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")

plt.tight_layout()
plt.show()


Question 9: Write a Python function to calculate and visualize the confidence interval
for a sample mean.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def plot_confidence_interval(data, confidence=0.95):
    """
    Calculate and visualize the confidence interval for a sample mean.
    data: list or numpy array of numeric values
    confidence: confidence level (default = 95%)
    """


    sample_mean = np.mean(data)
    sample_std = np.std(data, ddof=1)
    n = len(data)


    standard_error = sample_std / np.sqrt(n)


    z_value = norm.ppf((1 + confidence) / 2)


    margin_of_error = z_value * standard_error
    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error


    print(f"Sample Mean: {sample_mean:.3f}")
    print(f"{confidence*100:.0f}% Confidence Interval: ({lower_bound:.3f}, {upper_bound:.3f})")


    plt.figure(figsize=(8, 4))
    plt.axvline(sample_mean, color='blue', label='Sample Mean')
    plt.axvline(lower_bound, color='red', linestyle='--', label='Lower CI Bound')
    plt.axvline(upper_bound, color='red', linestyle='--', label='Upper CI Bound')

    plt.title(f"{int(confidence*100)}% Confidence Interval for Sample Mean")
    plt.xlabel("Value")
    plt.legend()
    plt.show()



data = np.array([12, 15, 14, 16, 13, 11, 15, 14, 13, 12])
plot_confidence_interval(data, confidence=0.95)


Question 10: Perform a Chi-square goodness-of-fit test using Python to compare
observed and expected distributions, and explain the outcome.

In [None]:
import numpy as np
from scipy.stats import chisquare


observed = np.array([18, 20, 16, 22, 19, 15])
expected = np.array([1/6]*6) * sum(observed)


chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)


print("Chi-square Statistic:", chi2_stat)
print("P-value:", p_value)


alpha = 0.05
if p_value < alpha:
    print("Conclusion: Reject the null hypothesis. Observed distribution differs from expected.")
else:
    print("Conclusion: Fail to reject the null hypothesis. Observed distribution matches expected.")


Purpose

The Chi-square goodness-of-fit test checks whether an observed categorical distribution matches an expected distribution.

Null hypothesis (H₀): The observed frequencies match the expected frequencies.

Alternative hypothesis (H₁): The observed frequencies do not match the expected frequencies.
