# Statistics Advanced 2 - Hypothesis Testing

### **1. What is hypothesis testing in statistics?**

**Hypothesis testing** is a statistical method used to make decisions or inferences about population parameters based on sample data.
It involves proposing a hypothesis and then using statistical evidence to accept or reject it.

### **2. What is the null hypothesis, and how does it differ from the alternative hypothesis?**

- The **null hypothesis ($H_0$)** is a default statement that there is no effect or no difference.
- The **alternative hypothesis ($H_1$ or $H_a$)** suggests that there is an effect or a difference.
- Hypothesis testing determines whether there is enough statistical evidence to reject $H_0$ in favor of $H_1$.

### **3. What is the significance level in hypothesis testing, and why is it important?**

The **significance level ($alpha$)** is the probability of rejecting the null hypothesis when it is actually true (Type I error).
- Common choices are 0.05, 0.01, or 0.10.
- It sets the threshold for how extreme the test statistic must be to reject $H_0$.

### **4. What does a P-value represent in hypothesis testing?**

The **P-value** is the probability of observing a test statistic at least as extreme as the one observed,
assuming the null hypothesis is true. It quantifies the evidence against $H_0$.

### **5. How do you interpret the P-value in hypothesis testing?**

- If the **P-value ≤ α**, reject the null hypothesis → sufficient evidence against $H_0$.
- If the **P-value > α**, fail to reject $H_0$ → insufficient evidence against $H_0$.
- A lower P-value indicates stronger evidence against $H_0$.

### **6. What are Type 1 and Type 2 errors in hypothesis testing?**

- **Type I Error**: Rejecting $H_0$ when it is true (false positive).
  - Probability = $\alpha$
- **Type II Error**: Failing to reject $H_0$ when it is false (false negative).
  - Probability = $\beta$
- Reducing one typically increases the other.

### **7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing?**

- A **one-tailed test** checks if the parameter is either greater than or less than a value.
- A **two-tailed test** checks for any significant difference (greater than or less than).
- Two-tailed tests divide the significance level between both ends of the distribution.

### **8. What is the Z-test, and when is it used in hypothesis testing?**

- A **Z-test** is used when the population variance is known and the sample size is large ($n ge 30$).
- It tests the population mean or proportion using the Z-distribution.
- Common in industrial and quality control settings.

### **9. How do you calculate the Z-score, and what does it represent in hypothesis testing?**

- Formula:  
  $$Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}$$
- It represents how many population SD, the sample mean is away from the population mean.
- Higher absolute Z indicates stronger evidence against $H_0$, though the decision of the testing depends on the given *Significance Level*.

### **10. What is the T-distribution, and when should it be used instead of the normal distribution?**

- The **T-distribution** (t-test) is used instead of the normal distribution (z-test) when the sample size is small ($n < 30$) and population standard deviation is *unknown*.
- It is wider and has heavier tails than the normal distribution.
- *As sample size increases, T approaches the standard normal distribution*.

### **11. What is the difference between a Z-test and a T-test?**

| Feature         | Z-test                                | T-test                                |
|-----------------|----------------------------------------|----------------------------------------|
| Population Variance Known  | Yes                                    | No                                     |
| Sample Size     | Large ($n \ge 30$)                    | Small ($n < 30$)                       |
| Distribution    | Standard Normal                       | Student’s t-distribution               |


### **12. What is the T-test, and how is it used in hypothesis testing?**

- A **T-test** is used to determine if there is a significant difference between the means of two groups.
- It is applied when the population standard deviation is unknown and the sample size is small.
- Common types: one-sample t-test, two-sample t-test, paired t-test.

### **13. What is the relationship between Z-test and T-test in hypothesis testing?**

- Both are used to test hypotheses about population means.
- **Z-test** assumes known population variance; **T-test** is used when variance is unknown and the population sd is estimated by sample sd.
- T-distribution converges to normal distribution as sample size increases.

### **14. What is a confidence interval, and how is it used to interpret statistical results?**

- A **confidence interval (CI)** gives a range of values likely to contain a population parameter.
- It is calculated from the sample statistic ± margin of error.
- A 95% CI means we are 95% confident that the true parameter lies in the interval.

### **15. What is the margin of error, and how does it affect the confidence interval?**

- The **margin of error** is the maximum acceptable difference between the sample statistic and the population parameter.
- Larger margin of error → wider confidence interval.
- It depends on standard deviation, sample size, and confidence level.
- Relation:
  $$CI = \bar{X} \pm {Z}_{\alpha} \frac{\sigma}{\sqrt{n}}$$

### **16. How is Bayes' Theorem used in statistics, and what is its significance?**

- **Bayes' Theorem** calculates the probability of an event based on prior knowledge of related conditions.
- Formula:  
  $$P(A|B) = \frac{P(B|A) P(A)}{P(B)}$$
- It is essential in Bayesian inference, machine learning, and decision-making under uncertainty.

### **17. What is the Chi-square distribution, and when is it used?**

- A **Chi-square distribution** is a continuous distribution used primarily in categorical data analysis.
- Chi-Squared distribution is the sum of squared standard normal variables.
- Mostly used in goodness-of-fit tests, independence tests, and variance testing, as a Non-Parametric test since Chi-Sq distribution only depends on the Degrees of Freedom

### **18. What is the Chi-square goodness of fit test, and how is it applied?**

- The **Chi-square goodness of fit test** evaluates whether an observed frequency distribution differs from a theoretical distribution.
- Test statistic:  
  $$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$$
  O : Observed, E: Expected
- Applied in categorical data to test distributions such as uniform, binomial, etc. And it is a Non-Parametric Test

### **19. What is the F-distribution, and when is it used in hypothesis testing?**

- The **F-distribution** is a **right-skewed** and a Non-Negative distribution used in **comparing variances**.
- F Distribution the **ratio of two chi-square** distributions divided by their degrees of freedom.
- Commonly used in **ANOVA** and **F-tests**.

### **20. What is an ANOVA test, and what are its assumptions?**

- **ANOVA (Analysis of Variance)** is used to compare means across multiple groups.
- Assumptions:
  - Independence of observations
  - Normally distributed populations
  - Homogeneity of variances(Homoscedasticity)
- Uses F-statistic to evaluate group mean differences.

### **21. What are the different types of ANOVA tests?**

- **One-way ANOVA**: Compares means of 3 or more independent groups.
- **Two-way ANOVA**: Evaluates the effect of two independent variables.
- **Repeated measures ANOVA**: Tests differences when the same subjects are measured multiple times.

### **22. What is the F-test, and how does it relate to hypothesis testing?**

- An **F-test** is used to compare two variances or test multiple means (as in ANOVA).
- Test statistic: ratio of sample variances.
- Used to determine if variances are equal (homogeneity of variance) and to support ANOVA results.