# Statistics Advance 5 Assignment

## Q1. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5 using Python. Interpret the results.

In [None]:
import numpy as np
from scipy import stats

# Given values
mean = 50
std = 5
n = 30  # assuming sample size is 30

# 95% confidence interval
confidence = 0.95
z = stats.norm.ppf(1 - (1-confidence)/2)
margin_of_error = z * (std / np.sqrt(n))
ci_lower = mean - margin_of_error
ci_upper = mean + margin_of_error

print(f"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")

**Interpretation:**

The 95% confidence interval gives a range in which we are 95% confident that the true population mean lies, based on our sample. If the interval is, for example, (48.21, 51.79), it means that if we were to take many samples and compute a confidence interval for each, about 95% of those intervals would contain the true mean.

## Q2. Explain the difference between a one-tailed and a two-tailed hypothesis test. Provide examples of when each would be appropriate.

**Answer:**

- **One-tailed test:** Tests for the possibility of the relationship in one direction. For example, testing if a new drug is *better* than the current drug (H₀: new ≤ current, H₁: new > current).
- **Two-tailed test:** Tests for the possibility of the relationship in both directions. For example, testing if a new drug is *different* (either better or worse) than the current drug (H₀: new = current, H₁: new ≠ current).

**When to use:**
- Use a one-tailed test when you are only interested in deviations in one direction.
- Use a two-tailed test when deviations in both directions are important.

## Q4. What is a p-value? How is it used in hypothesis testing?

**Answer:**

A **p-value** is the probability of obtaining results at least as extreme as the observed results, assuming that the null hypothesis is true. In hypothesis testing, the p-value helps you decide whether to reject the null hypothesis:
- If the p-value is less than the chosen significance level (e.g., 0.05), you reject the null hypothesis.
- If the p-value is greater, you fail to reject the null hypothesis.

The smaller the p-value, the stronger the evidence against the null hypothesis.

## Q5. Calculate the p-value for a z-score of 2.0 in a right-tailed test using Python.

In [None]:
from scipy.stats import norm

z_score = 2.0
# For a right-tailed test, p-value is the area to the right of z
p_value = 1 - norm.cdf(z_score)
print(f"P-value for z=2.0 (right-tailed): {p_value:.4f}")

**Interpretation:**

A p-value of approximately 0.0228 means there is a 2.28% chance of observing a z-score of 2.0 or higher if the null hypothesis is true. If your significance level is 0.05, you would reject the null hypothesis.

## Q6. What is a Type I error and a Type II error? Provide examples of each.

**Answer:**

- **Type I Error (False Positive):** Rejecting the null hypothesis when it is actually true. Example: Concluding a new drug works when it actually does not.
- **Type II Error (False Negative):** Failing to reject the null hypothesis when it is actually false. Example: Concluding a new drug does not work when it actually does.

## Q7. What is statistical power? How can it be increased?

**Answer:**

- **Statistical power** is the probability that a test correctly rejects a false null hypothesis (i.e., detects an effect when there is one).
- **How to increase power:**
  - Increase sample size
  - Increase effect size
  - Increase significance level (alpha)
  - Reduce variability in the data
  - Use a more sensitive test

## Q8. What is the Central Limit Theorem (CLT)? Why is it important in statistics?

**Answer:**

The **Central Limit Theorem (CLT)** states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the population's distribution, provided the samples are independent and identically distributed.

**Importance:**
- Allows us to use normal probability theory to make inferences about population means, even when the population is not normally distributed.
- Forms the basis for many statistical tests and confidence intervals.

## Q11. What is the difference between parametric and non-parametric tests? Give examples of each.

**Answer:**

- **Parametric tests** assume underlying statistical distributions in the data (e.g., normal distribution). They are generally more powerful if assumptions are met.
  - Examples: t-test, ANOVA, Pearson correlation
- **Non-parametric tests** do not assume a specific distribution. They are used when data do not meet parametric assumptions.
  - Examples: Mann-Whitney U test, Wilcoxon signed-rank test, Kruskal-Wallis test, Spearman correlation