#**Applied Statistics and Inference | Assignment**

**Question 1 : What are Type I and Type II errors in hypothesis testing, and how do they
impact decision-making?**
-
**Answer:**
-
- In hypothesis testing, a Type I error happens when we reject a true null hypothesis. It means we think there is an effect or difference when actually there isn’t one. The probability of making a Type I error is called alpha (α), often set at 0.05.

- A Type II error occurs when we fail to reject a false null hypothesis, meaning we miss a real effect or difference. The probability of making a Type II error is beta (β), and the power of a test is (1 - β).

**These errors affect decision-making because:**

- A Type I error can lead to believing something is true when it’s not (false positive).

- A Type II error can lead to ignoring a real effect (false negative).
So, choosing the right significance level helps balance both errors.

**Question 2:What is the P-value in hypothesis testing, and how should it be interpreted
in the context of the null hypothesis?**
-
**Answer:**
-
The P-value is the probability of getting a test statistic as extreme as, or more extreme than, the observed result, if the null hypothesis is true.

- A small P-value (less than 0.05) means there’s strong evidence against the null hypothesis, so we usually reject it.

- A large P-value (greater than 0.05) suggests there isn’t enough evidence to reject the null, so we fail to reject it.

**In short:**
- the P-value helps decide whether the sample data provides enough proof to reject the null hypothesis.

**Question 3:Explain the difference between a Z-test and a T-test, including when to use
each.**
-
**Answer:**
-
The Z-test and T-test are both used to compare means, but they differ mainly in sample size and data assumptions.

- Z-test: Used when the population standard deviation is known and the sample size is large (n > 30).

- T-test: Used when the population standard deviation is unknown and the sample size is small (n < 30).

Both tests assume the data are approximately normally distributed.

**Example:**
-

- Z-test → Testing average height of students when population σ is known.

- T-test → Testing mean exam score of a small class when σ is unknown

**Question 4:What is a confidence interval, and how does the margin of error influence
its width and interpretation?**
-
**Answer:**
-
 - A confidence interval (CI) gives a range of values that is likely to contain the true population parameter (like the mean). For example, a 95% CI means we are 95% confident that the true mean lies within that range.

**The margin of error (ME) affects how wide the CI is:**
-

- A larger ME → wider interval → less precise estimate.

- A smaller ME → narrower interval → more precise estimate.

ME depends on the sample size, variability, and confidence level. Higher confidence levels and smaller sample sizes both increase the margin of error.

**Question 5: Describe the purpose and assumptions of an ANOVA test. How does it
extend hypothesis testing to more than two groups?**
-
**Answer:**
-
- **ANOVA (Analysis of Variance)** is used to compare the means of three or more groups to see if at least one group mean is significantly different.

**Assumptions of ANOVA:**
-

- 1. Samples are independent.

- 2. Data are approximately normally distributed.

- 3. Variances are equal across groups (homogeneity of variance).

ANOVA extends hypothesis testing by analyzing all groups at once instead of running multiple t-tests, which increases the chance of error.

If ANOVA shows a significant difference, we use post-hoc tests (like Tukey’s test) to find which groups differ.

**Question 6: Write a Python program to perform a one-sample Z-test and interpret the
result for a given dataset.**
-
**Answer:**
-

In [None]:
import numpy as np
from scipy import stats

# Sample data
data = [55, 60, 58, 62, 57, 59, 61, 63, 56, 60]
mean_population = 58
std_population = 3

# Perform one-sample Z-test
sample_mean = np.mean(data)
n = len(data)
z = (sample_mean - mean_population) / (std_population / np.sqrt(n))

# Calculate p-value
p_value = 2 * (1 - stats.norm.cdf(abs(z)))

print("Z-score:", z)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject Null Hypothesis")
else:
    print("Fail to Reject Null Hypothesis")


**Interpretation:**
-

- If the p-value < 0.05, it means the sample mean is significantly different from the population mean. Otherwise, there’s no significant difference.

**Question 7:Simulate a dataset from a binomial distribution (n = 10, p = 0.5) using
NumPy and plot the histogram.**
-
**Answer:**
-

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Simulate data
data = np.random.binomial(n=10, p=0.5, size=1000)

# Plot histogram
plt.hist(data, bins=11, edgecolor='black')
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.show()


**Interpretation:**
-
- The histogram will look roughly symmetric around 5 (half of 10), which is the expected number of successes.

**Question 8: Generate multiple samples from a non-normal distribution and implement
the Central Limit Theorem using Python.**
-
**Answer:**
-

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate non-normal data (exponential distribution)
data = np.random.exponential(scale=2, size=10000)

# Take many sample means
means = [np.mean(np.random.choice(data, 50)) for _ in range(1000)]

# Plot original vs sample mean distributions
plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.hist(data, bins=30, color='orange')
plt.title("Original Non-Normal Data")

plt.subplot(1,2,2)
plt.hist(means, bins=30, color='green')
plt.title("Sampling Distribution (CLT Applied)")
plt.show()


**Interpretation:**
-
- Even though the original data is skewed, the distribution of sample means becomes approximately normal, showing the Central Limit Theorem in action.

**Question 9: Write a Python function to calculate and visualize the confidence interval
for a sample mean.**
-
**Answer:**
-

In [None]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

def confidence_interval(data, confidence=0.95):
    mean = np.mean(data)
    std_err = stats.sem(data)
    h = std_err * stats.t.ppf((1 + confidence) / 2, len(data)-1)
    lower = mean - h
    upper = mean + h

    print(f"{confidence*100}% Confidence Interval: ({lower:.2f}, {upper:.2f})")

    # Plot
    plt.errorbar(1, mean, yerr=h, fmt='o', color='blue', capsize=5)
    plt.title("Confidence Interval for Sample Mean")
    plt.xticks([])
    plt.ylabel("Value")
    plt.show()

# Example
data = [12, 15, 14, 10, 13, 15, 14, 16, 12]
confidence_interval(data)


**Interpretation:**
-
- This function calculates and displays the confidence interval around the sample mean. The error bar visually shows the uncertainty in the estimate.

**Question 10: Perform a Chi-square goodness-of-fit test using Python to compare
observed and expected distributions, and explain the outcome.**
-
**Answer:**
-

In [None]:
from scipy.stats import chisquare

# Observed and expected frequencies
observed = [50, 30, 20]
expected = [40, 40, 20]

# Perform Chi-square test
chi_stat, p_value = chisquare(f_obs=observed, f_exp=expected)

print("Chi-square Statistic:", chi_stat)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject Null Hypothesis - Distributions are different")
else:
    print("Fail to Reject Null Hypothesis - Distributions are similar")


**Interpretation:**
-
- If the p-value is less than 0.05, it means the observed data significantly differs from the expected distribution. Otherwise, they are statistically similar.